Creating file readers
File readers are used to read the original dataset and write the synthetic dataset.
Metasyn implements currently two file readers: CsvFileReader and SavFileReader.
To implement a new file reader, you should create a new class that is derived from the BaseFileReader.
To ensure that the file reader is available to metasyn, you have to decorate the class with the filereader()
decorator. At a minimum, you should also implement the BaseFileReader._write_synthetic() method.
This methods enables metasyn to find your newly created file reader for writing to a synthetic file.
from metasyn.file import filereader, BaseFileReader
@filereader
class MyFileReader(BaseFileReader):
name = "fancy_file_reader"
@classmethod
def from_file(cls, fp: Union[Path, str], extra_arg=...) -> tuple[pl.DataFrame, MyFileReader]:
df = read_fancy(fp)
# Create the file format metadata, only add metadata that is necessary for writing.
metadata = {
"extra_arg": extra_arg,
"other_metadata": ...,
}
return df, cls(metadata, Path(fp).name)
def _write_synthetic(self, df: pl.DataFrame, fp: Path):
# Write the synthetic file from the polars data frame.
write_fancy(df, self.metadata["extra_arg"], self.metadata["other_metadata"])