metasyn.registry
Module implementing the distribution registry.
Distribution registries are used to find/fit distributions that are available. See pyproject.toml on how the builtin distributions are registered.
Classes
|
Registry of distributions and fitters. |
- class metasyn.registry.DistributionRegistry(fitters)
Bases:
objectRegistry of distributions and fitters.
This class is responsible for managing and providing access to fitters and distributions. It allows for fitting distributions, as well as retrieving distributions/fitters based on certain constraints such as privacy level, variable type, and uniqueness.
You can directly initialize the class with a list of fitters, but most likely you will want to use the
DistributionRegistry.parse()method, which can load fitters from registries provided by plugins.- Parameters:
fitters (
list[type[BaseFitter]]) – Fitters to initialize the registry with.
- classmethod parse(plugins)
Initialize the distribution registry from plugin names.
- Parameters:
plugins (
Union[list[str],None,str]) – Name of plugin(s) for fitters/distribution or a list of names.
- fit(series, var_type, dist_spec, privacy=<metasyn.privacy.BasicPrivacy object>)
Fit a distribution to a column/series.
- Parameters:
series (
Series) – The data to fit the distributions to.var_type (
str) – The variable type of the data.dist_spec (
DistributionSpec) – Distribution to fit. If not supplied or None, the information criterion will be used to determine which distribution is the most suitable. For most variable types, the information criterion is based on the BIC (Bayesian Information Criterion).privacy (
BasePrivacy) – Level of privacy that will be used in the fit.
- Return type:
tuple[BaseDistribution,Optional[BaseFitter]]
- create(var_spec)
Create a distribution without any data.
- Parameters:
var_spec (
Union[VarSpec,VarSpecAccess]) – A variable configuration that provides all the information to create the distribution.- Return type:
BaseDistribution- Returns:
A distribution according to the variable specifications.
- find_fitter(dist_name, var_type, privacy=<metasyn.privacy.BasicPrivacy object>, unique=False, version=None)
Find a distribution and fit keyword arguments from a name.
Sometimes there might be multiple possible fitters that satisfy the criteria. In this case the first in the registry will be chosen. If you do not want this behavior, it is recommended to specify the fitter name directly.
- Parameters:
dist_name (
str) – Name of the distribution that needs to be fit, e.g., for the built-in uniform distribution: “uniform”, “core.uniform” or name of the fitter: “ContinuousUniformFitter”.privacy (
Optional[BasePrivacy]) – Type of privacy to be applied.var_type (
Optional[str]) – Type of the variable to find. If var_type is None, then do not check the variable type.unique (
bool) – Whether the distribution to be found is unique.version (
Optional[str]) – Version of the distribution to get. If necessary get them from legacy.
- Returns:
Fitter that satisfies the requirements.
- Return type:
tuple[Type[BaseFitter]
- filter_fitters(name=None, privacy=None, var_type=None, unique=False, version=None)
Get the available distributions with constraints.
- Parameters:
privacy (
Optional[BasePrivacy]) – Privacy level/type to filter the distributions.var_type (
Optional[str]) – Variable type to filter for, e.g. ‘string’.unique (
bool) – Whether the distributions to be gotten are unique.use_legacy – Whether to use legacy distributions or not.
name (str | None)
version (str | None)
- Returns:
List of distributions that fit the given constraints.
- Return type:
dist_list
- from_dict(var_dict)
Create a distribution from a dictionary.
- Parameters:
var_dict (
dict[str,Any]) – Variable dictionary that includes the distribution properties.- Returns:
Distribution representing the dictionary.
- Return type:
BaseDistribution
- property distributions
All available distributions from fitters, deduplicated.