metasyn.var
Module defining the MetaVar class, which represents a metadata variable.
Classes
|
Metadata variable describing a column in a MetaFrame. |
- class metasyn.var.MetaVar(name, var_type, distribution, dtype='unknown', description=None, prop_missing=0.0, creation_method=None)
Bases:
objectMetadata variable describing a column in a MetaFrame.
MetaVar is a structure that holds all metadata needed to generate a synthetic column for it. This is the variable level building block for the MetaFrame. It contains the methods to convert a polars Series into a variable with an appropriate distribution. The MetaVar class is to the MetaFrame what a polars Series is to a DataFrame.
This class is considered a passthrough class used by the MetaFrame class, and is not intended to be used directly by the user.
- Parameters:
var_type (
Optional[str]) – String containing the variable type, e.g. continuous, string, etc.series – Series to create the variable from. Series is None by default and in this case the value is ignored. If it is not supplied, then the variable cannot be fit.
name (
str) – Name of the variable/column.distribution (
BaseDistribution) – Distribution to draw random values from. Can also be set by using the fit method.prop_missing (
float) – Proportion of the series that are missing/NA.dtype (
str) – Type of the original values, e.g. int64, float, etc. Used for type-casting back. The default value is “unknown”.description (
Optional[str]) – User-provided description of the variable.creation_method (
Optional[dict]) – A dictionary that contains information on how the variable was created. If None, it will be assumed to have been created by the user.
- to_dict()
Create a dictionary from the variable.
- Return type:
Dict[str,Any]
- classmethod fit(series, dist_spec=None, dist_registry=<metasyn.registry.DistributionRegistry object>, privacy=<metasyn.privacy.BasicPrivacy object>, prop_missing=None, description=None)
Fit distributions to the data.
If multiple distributions are available for the current data type, use the one that fits the data the best.
While it has no arguments or return values, it will set the distribution attribute to the most suitable distribution.
- Parameters:
series (
Series) – Data series to fit a distribution to.dist_spec (
Union[dict,type,BaseDistribution,DistributionSpec,None]) – The distribution to fit. In case of a string, search for it using the aliases of all distributions. Otherwise use the supplied distribution (class). Examples of allowed strings are: “normal”, “uniform”, “faker.city.nl_NL”. If not supplied, fit the best available distribution for the variable type.dist_registry (
DistributionRegistry) – Distribution registry that is used for fitting.privacy (
BasePrivacy) – Privacy level to use for fitting the series.prop_missing (
Optional[float]) – Proportion of the values missing, default None.description (
Optional[str]) – Description for the variable.
- Return type:
- draw()
Draw a random item for the variable in whatever type is required.
- Return type:
Any
- draw_series(n, seed, progress_bar=True)
Draw a new synthetic series from the metadata.
- Parameters:
n (
int) – Length of the series to be created.seed (
Optional[int]) – Seed value for the internal random number generator. Set this to ensure reproducibility.progress_bar (
bool) – Whether to display a progress bar.
- Returns:
Polars series with the synthetic data.
- Return type:
polars.Series
- classmethod from_dict(var_dict, plugins=None)
Restore variable from dictionary.
- Parameters:
plugins (
Union[None,str,list[str]]) – Plugins to use to create the variable. If None, use all installed/available plugins.var_dict (
Dict[str,Any]) – This dictionary contains all the variable and distribution information to recreate it from scratch.
- Returns:
Initialized metadata variable.
- Return type: