metasyn.MetaVar

class metasyn.MetaVar(name, var_type, distribution, dtype='unknown', description=None, prop_missing=0.0, creation_method=None)

Metadata variable describing a column in a MetaFrame.

MetaVar is a structure that holds all metadata needed to generate a synthetic column for it. This is the variable level building block for the MetaFrame. It contains the methods to convert a polars Series into a variable with an appropriate distribution. The MetaVar class is to the MetaFrame what a polars Series is to a DataFrame.

This class is considered a passthrough class used by the MetaFrame class, and is not intended to be used directly by the user.

Parameters:
  • var_type (Optional[str]) – String containing the variable type, e.g. continuous, string, etc.

  • series – Series to create the variable from. Series is None by default and in this case the value is ignored. If it is not supplied, then the variable cannot be fit.

  • name (str) – Name of the variable/column.

  • distribution (BaseDistribution) – Distribution to draw random values from. Can also be set by using the fit method.

  • prop_missing (float) – Proportion of the series that are missing/NA.

  • dtype (str) – Type of the original values, e.g. int64, float, etc. Used for type-casting back. The default value is “unknown”.

  • description (Optional[str]) – User-provided description of the variable.

  • creation_method (Optional[dict]) – A dictionary that contains information on how the variable was created. If None, it will be assumed to have been created by the user.

__init__(name, var_type, distribution, dtype='unknown', description=None, prop_missing=0.0, creation_method=None)
Parameters:
  • name (str)

  • var_type (str | None)

  • distribution (BaseDistribution)

  • dtype (str)

  • description (str | None)

  • prop_missing (float)

  • creation_method (dict | None)

Methods

__init__(name, var_type, distribution[, ...])

draw()

Draw a random item for the variable in whatever type is required.

draw_series(n, seed[, progress_bar])

Draw a new synthetic series from the metadata.

fit(series[, dist_spec, dist_registry, ...])

Fit distributions to the data.

from_dict(var_dict[, plugins])

Restore variable from dictionary.

to_dict()

Create a dictionary from the variable.