metasyn.distribution.base

Module serving as the basis for all metasyn distributions.

The base module contains the BaseDistribution class, which is the base class for all distributions. It also contains the ScipyDistribution class, which is a specialized base class for distributions that are built on top of SciPy’s statistical distributions.

Additionally it contains the UniqueDistributionMixin class, which is a mixin class that can be used to make a distribution unique (i.e., one that does not contain duplicate values).

Finally it contains the metadist() decorator, which is used to set the class attributes of a distribution.

Functions

builtin_fitter([distribution, var_type, ...])

Decorate builtin fitters.

convert_to_series(values)

Convert list or pandas series to polars series.

metadist([name, var_type, unique, version])

Decorate class to create a distribution with the right properties.

metafit([distribution, var_type, version, ...])

Decorate class to create a fitter with the correct class attributes.

metasyn.distribution.base.builtin_fitter(distribution=None, var_type=None, version=None, privacy_type=None)

Decorate builtin fitters.

Parameters:
  • distribution (Optional[type[BaseDistribution]]) – Class that the fitter will return after a succesful fit.

  • var_type (Union[str, list[str], None]) – Variable type(s) that the fitter implements, e.g. continuous, categorical, string.

  • version (Optional[str]) – Version of the fitter. Increment this to ensure that compatibility is properly handled.

  • privacy_type (Optional[str]) – Privacy class/implementation of the fitter.

Returns:

Class with the appropriate class variables.

Return type:

cls

metasyn.distribution.base.convert_to_series(values)

Convert list or pandas series to polars series.

Return type:

Series

Parameters:

values (ndarray[tuple[Any, ...], dtype[_ScalarT]] | Series)

metasyn.distribution.base.metadist(name=None, var_type=None, unique=None, version=None)

Decorate class to create a distribution with the right properties.

Parameters:
  • name (Optional[str]) – Name that identifies the distribution uniquely, e.g. core.uniform, core.regex. The name should use a period (.) so that the first part is the namespace (e.g. core), and the second part the name of the distribution.

  • var_type (Union[str, list[str], None]) – Variable type of the distribution, e.g. continuous, categorical, string.

  • unique (Optional[bool]) – Whether the distribution is unique or not.

  • version (Optional[str]) – Version of the distribution. Increment this to ensure that compatibility is properly handled.

Returns:

Class with the appropriate class variables.

Return type:

cls

metasyn.distribution.base.metafit(distribution=None, var_type=None, version=None, privacy_type=None, plugin=None, plugin_version=None)

Decorate class to create a fitter with the correct class attributes.

Parameters:
  • distribution (Optional[type[BaseDistribution]]) – Class that the fitter will return after a succesful fit.

  • var_type (Union[str, list[str], None]) – Variable type(s) that the fitter implements, e.g. continuous, categorical, string.

  • version (Optional[str]) – Version of the fitter. Increment this to ensure that compatibility is properly handled.

  • privacy_type (Optional[str]) – Privacy class/implementation of the fitter.

  • plugin (Optional[str]) – Name of the plugin for the fitter or builtin (if part of metasyn itself).

  • plugin_version (Optional[str]) – Version of the plugin used.

Returns:

Class with the appropriate class variables.

Return type:

cls

Classes

BaseDistribution()

Abstract base class to define a distribution.

BaseFitter(privacy)

Base class for fitters.

ScipyDistribution()

Base class for numerical distributions using Scipy.

ScipyFitter(privacy)

Base fitter for scipy distributions.

UniqueDistributionMixin(*args, **kwargs)

Mixin class to make unique version of base distributions.

class metasyn.distribution.base.BaseDistribution

Bases: ABC

Abstract base class to define a distribution.

All distributions should be derived from this class, and should implement the following methods: _fit(), draw(), _param_dict(), _param_schema(), default_distribution() and __init__.

name: str = 'unknown'

The identifier for the implemented distribution

var_type: Union[str, Sequence[str]] = 'unknown'

The variable type of the distribution

unique: bool = False

Whether the distribution creates only unique values

version: str = '1.0'

Version of the implemented distribution

abstractmethod draw()

Draw a random element from the fitted distribution.

Return type:

object

draw_reset()

Reset the drawing of elements to start again.

Return type:

None

to_dict()

Convert the distribution to a dictionary.

Return type:

dict

classmethod schema()

Create sub-schema to validate GMF file.

Return type:

dict

classmethod from_dict(dist_dict)

Create a distribution from a dictionary.

Return type:

BaseDistribution

Parameters:

dist_dict (dict)

information_criterion(values)

Get the BIC value for a particular set of values.

Parameters:

values (array_like) – Values to determine the BIC value of.

Return type:

float

classmethod matches_name(name)

Check whether the name matches the distribution.

Parameters:

name (str) – Name to match to the distribution.

Returns:

Whether the name matches.

Return type:

bool

abstractmethod classmethod default_distribution(var_type=None)

Get a distribution with default parameters.

Return type:

BaseDistribution

Parameters:

var_type (str | None)

draw_list(n)

Draw a list of values from the distribution.

Parameters:

n (int) – Number of items to draw from the distribution.

Raises:

NotImplementedError: – If the distribution hasn’t implemented a draw_list.

Return type:

list

Returns:

List of values.

class metasyn.distribution.base.BaseFitter(privacy)

Bases: ABC

Base class for fitters.

Parameters:

privacy (BasePrivacy)

classmethod matches_name(name)

Check whether the name matches the fitter.

Parameters:

name (str) – Name to match to the fitter.

Returns:

Whether the name matches.

Return type:

bool

class metasyn.distribution.base.ScipyDistribution

Bases: BaseDistribution

Base class for numerical distributions using Scipy.

This base class makes it easy to implement new numerical distributions. It can also be used for non-Scipy distributions, provided the distribution implements logpdf, rvs and fit methods.

property n_par: int

Number of parameters for distribution.

Type:

int

draw()

Draw a random element from the fitted distribution.

draw_list(n)

Draw a list of values from the distribution.

Parameters:

n (int) – Number of items to draw from the distribution.

Raises:

NotImplementedError: – If the distribution hasn’t implemented a draw_list.

Return type:

list

Returns:

List of values.

information_criterion(values)

Get the BIC value for a particular set of values.

Parameters:

values (array_like) – Values to determine the BIC value of.

class metasyn.distribution.base.ScipyFitter(privacy)

Bases: BaseFitter

Base fitter for scipy distributions.

Parameters:

privacy (BasePrivacy)

class metasyn.distribution.base.UniqueDistributionMixin(*args, **kwargs)

Bases: BaseDistribution

Mixin class to make unique version of base distributions.

This mixin class can be used to extend base distribution classes, adding functionality that ensures generated values are unique. It overrides the draw method of the base class, adding a check to prevent duplicate values from being drawn. If a duplicate value is drawn, it retries up to 1e5 times before raising a ValueError.

The UniqueDistributionMixin is used in various unique metasyn distribution variations, such as UniqueFakerDistribution and UniqueRegexDistribution.

name

unknown

unique

True

version

1.0

var_type

unknown

unique: bool = True

Whether the distribution creates only unique values

draw_reset()

Reset the drawing of elements to start again.

draw()

Draw a random element from the fitted distribution.

Return type:

object

information_criterion(values)

Get the BIC value for a particular set of values.

Parameters:

values (array_like) – Values to determine the BIC value of.