metasyn.distribution.freetext
Module for the free text distribution that detects the language.
Functions
|
Detect the language of some text. |
- metasyn.distribution.freetext.detect_language(values)
Detect the language of some text.
- Parameters:
values (
Iterable) – Values to detect the language of (usually polars dataframe).- Returns:
Two letter ISO code to represent the language, or None if it could not be determined.
- Return type:
language
Classes
|
Free text distribution. |
|
Fitter for the freetext distribution. |
- class metasyn.distribution.freetext.FreeTextDistribution(locale, avg_sentences, avg_words)
Bases:
BaseDistributionFree text distribution.
This distribution detects the language and generates sentences using the Faker package. The average number of sentences and words per item are detected using regexes (with the lingua package).
- Parameters:
locale (str) – Locale used for the faker package.
avg_sentences (
float) – Average number of sentences (punctuation marks) per (non-NA) row, if None do not make sentences.avg_words (
float) – Average number of words per (non-NA) row.
- name
core.freetext
- unique
False
- version
1.0
- var_type
string
- draw()
Draw a random element from the fitted distribution.
- information_criterion(values)
Get the BIC value for a particular set of values.
- Parameters:
values (array_like) – Values to determine the BIC value of.
- Return type:
float
- classmethod default_distribution(var_type=None)
Get a distribution with default parameters.
- Return type:
BaseDistribution
- name: str = 'core.freetext'
The identifier for the implemented distribution
- var_type: Union[str, Sequence[str]] = 'string'
The variable type of the distribution
- class metasyn.distribution.freetext.FreeTextFitter(privacy)
Bases:
BaseFitterFitter for the freetext distribution.
- Parameters:
privacy (BasePrivacy)
- dist_class
<class ‘metasyn.distribution.freetext.FreeTextDistribution’>
- version
1.0
- var_type
string
- privacy
none
- distribution
alias of
FreeTextDistribution