metasyn.demo.dataset

Create and retrieve demo datasets.

Functions

demo_data([name])

Get a demonstration dataset as a prepared polars dataframe.

demo_dataframe([name])

Legacy alias for demo_data.

demo_file([name])

Get the path for a demo data file.

register(*args)

Register a dataset so that it can be found by name.

metasyn.demo.dataset.demo_data(name='titanic')

Get a demonstration dataset as a prepared polars dataframe.

There are eight options:
  • titanic (Included in pandas, but post-processed to contain more columns)

  • spaceship (CC-BY from https://www.kaggle.com/competitions/spaceship-titanic)

  • synthea_imaging (CC-BY from https://synthea.mitre.org/downloads)

  • fruit (very basic example data from Polars)

  • survey (columns from ESS round 11 Human Values Scale questionnaire for the Netherlands)

  • test (columns with all supported data types)

  • hospital (Example electronic health record hospital dataset)

  • druguse (Example dataset with answers to an open question on study participants’ daily drug use)

Parameters:

name (str) – Name of the demo dataset.

Return type:

DataFrame

Returns:

Polars dataframe with correct column types

References

European Social Survey European Research Infrastructure (ESS ERIC). (2024). ESS11 integrated file, edition 1.0 [Data set]. Sikt - Norwegian Agency for Shared Services in Education and Research. https://doi.org/10.21338/ess11e01_0

metasyn.demo.dataset.demo_dataframe(name='titanic')

Legacy alias for demo_data.

Return type:

DataFrame

Parameters:

name (str)

metasyn.demo.dataset.demo_file(name='titanic')

Get the path for a demo data file.

There are eight options:
  • titanic (Included in pandas, but post-processed to contain more columns)

  • spaceship (CC-BY from https://www.kaggle.com/competitions/spaceship-titanic)

  • synthea_imaging (CC-BY from https://synthea.mitre.org/downloads)

  • fruit (very basic example data from Polars)

  • survey (columns from ESS round 11 Human Values Scale questionnaire for the Netherlands)

  • test (columns with all supported data types)

  • hospital (Example electronic health record hospital dataset)

  • druguse (Example dataset with answers to an open question on study participants’ daily drug use)

Parameters:

name (str) – Name of the demo dataset.

Return type:

Path

Returns:

Path to the dataset.

References

European Social Survey European Research Infrastructure (ESS ERIC). (2024). ESS11 integrated file, edition 1.0 [Data set]. Sikt - Norwegian Agency for Shared Services in Education and Research. https://doi.org/10.21338/ess11e01_0

metasyn.demo.dataset.register(*args)

Register a dataset so that it can be found by name.

Classes

BaseDataset()

Base class for demo datasets.

BaseMultiDataset()

Abstract class to define a dataset with multiple tables.

DrugUseDataset()

Example dataset with answers to an open question on study participants' daily drug use.

FruitDataset()

Very basic example data from Polars.

HospitalDataset()

Example electronic health record hospital dataset.

ShopMultiDataset()

An example dataset containing customers, products and purchases.

SpaceShipDataset()

CC-BY from https://www.kaggle.com/competitions/spaceship-titanic.

SurveyDataset()

Columns from ESS round 11 Human Values Scale questionnaire for the Netherlands.

SyntheaImagingDataset()

Synthetic medical health dataset from Synthea.

TestDataset()

Test dataset with all supported data types.

TitanicDataset()

Included in pandas, but post-processed to contain more columns.

class metasyn.demo.dataset.BaseDataset

Bases: ABC

Base class for demo datasets.

class metasyn.demo.dataset.BaseMultiDataset

Bases: BaseDataset

Abstract class to define a dataset with multiple tables.

get_data()

Alias for get_dataframes().

get_dataframes()

Create the dataframes (from file for example).

Returns:

Dictionary with dataframes.

Return type:

dataframes

class metasyn.demo.dataset.DrugUseDataset

Bases: BaseDataset

Example dataset with answers to an open question on study participants’ daily drug use.

This example dataset was generated through ChatGPT-4o on 07-11-2024 using the following prompt: > Create a csv with 12 rows and 2 columns: participant_id, and drug_use. The participant_id has a standard alphanumeric structure, and the drug_use contains participant’s responses on how they use drugs in their daily life.

class metasyn.demo.dataset.FruitDataset

Bases: BaseDataset

Very basic example data from Polars.

class metasyn.demo.dataset.HospitalDataset

Bases: BaseDataset

Example electronic health record hospital dataset.

This dataset was created manually by the metasyn team.

class metasyn.demo.dataset.ShopMultiDataset

Bases: BaseMultiDataset

An example dataset containing customers, products and purchases.

class metasyn.demo.dataset.SpaceShipDataset

Bases: BaseDataset

CC-BY from https://www.kaggle.com/competitions/spaceship-titanic.

class metasyn.demo.dataset.SurveyDataset

Bases: BaseDataset

Columns from ESS round 11 Human Values Scale questionnaire for the Netherlands.

class metasyn.demo.dataset.SyntheaImagingDataset

Bases: BaseDataset

Synthetic medical health dataset from Synthea.

Jason Walonoski, Mark Kramer, Joseph Nichols, Andre Quina, Chris Moesel, Dylan Hall, Carlton Duffett, Kudakwashe Dube, Thomas Gallagher, Scott McLachlan, Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record, Journal of the American Medical Informatics Association, Volume 25, Issue 3, March 2018, Pages 230–238, https://doi.org/10.1093/jamia/ocx079

class metasyn.demo.dataset.TestDataset

Bases: BaseDataset

Test dataset with all supported data types.

class metasyn.demo.dataset.TitanicDataset

Bases: BaseDataset

Included in pandas, but post-processed to contain more columns.