metasyn.distribution.freetext

Module for the free text distribution that detects the language.

Functions

detect_language(values)

Detect the language of some text.

metasyn.distribution.freetext.detect_language(values)

Detect the language of some text.

Parameters:

values (Iterable) – Values to detect the language of (usually polars dataframe).

Returns:

Two letter ISO code to represent the language, or None if it could not be determined.

Return type:

language

Classes

FreeTextDistribution(locale, avg_sentences, ...)

Free text distribution.

FreeTextFitter(privacy)

Fitter for the freetext distribution.

class metasyn.distribution.freetext.FreeTextDistribution(locale, avg_sentences, avg_words)

Bases: BaseDistribution

Free text distribution.

This distribution detects the language and generates sentences using the Faker package. The average number of sentences and words per item are detected using regexes (with the lingua package).

Parameters:
  • locale (str) – Locale used for the faker package.

  • avg_sentences (float) – Average number of sentences (punctuation marks) per (non-NA) row, if None do not make sentences.

  • avg_words (float) – Average number of words per (non-NA) row.

name

core.freetext

unique

False

version

1.0

var_type

string

draw()

Draw a random element from the fitted distribution.

information_criterion(values)

Get the BIC value for a particular set of values.

Parameters:

values (array_like) – Values to determine the BIC value of.

Return type:

float

classmethod default_distribution(var_type=None)

Get a distribution with default parameters.

Return type:

BaseDistribution

name: str = 'core.freetext'

The identifier for the implemented distribution

var_type: Union[str, Sequence[str]] = 'string'

The variable type of the distribution

class metasyn.distribution.freetext.FreeTextFitter(privacy)

Bases: BaseFitter

Fitter for the freetext distribution.

Parameters:

privacy (BasePrivacy)

dist_class

<class ‘metasyn.distribution.freetext.FreeTextDistribution’>

version

1.0

var_type

string

privacy

none

distribution

alias of FreeTextDistribution