metasyn.demo.dataset
Create and retrieve demo datasets.
Functions
|
Get a demonstration dataset as a prepared polars dataframe. |
|
Legacy alias for demo_data. |
|
Get the path for a demo data file. |
|
Register a dataset so that it can be found by name. |
- metasyn.demo.dataset.demo_data(name='titanic')
Get a demonstration dataset as a prepared polars dataframe.
- There are eight options:
titanic (Included in pandas, but post-processed to contain more columns)
spaceship (CC-BY from https://www.kaggle.com/competitions/spaceship-titanic)
synthea_imaging (CC-BY from https://synthea.mitre.org/downloads)
fruit (very basic example data from Polars)
survey (columns from ESS round 11 Human Values Scale questionnaire for the Netherlands)
test (columns with all supported data types)
hospital (Example electronic health record hospital dataset)
druguse (Example dataset with answers to an open question on study participants’ daily drug use)
- Parameters:
name (
str) – Name of the demo dataset.- Return type:
DataFrame- Returns:
Polars dataframe with correct column types
References
European Social Survey European Research Infrastructure (ESS ERIC). (2024). ESS11 integrated file, edition 1.0 [Data set]. Sikt - Norwegian Agency for Shared Services in Education and Research. https://doi.org/10.21338/ess11e01_0
- metasyn.demo.dataset.demo_dataframe(name='titanic')
Legacy alias for demo_data.
- Return type:
DataFrame- Parameters:
name (str)
- metasyn.demo.dataset.demo_file(name='titanic')
Get the path for a demo data file.
- There are eight options:
titanic (Included in pandas, but post-processed to contain more columns)
spaceship (CC-BY from https://www.kaggle.com/competitions/spaceship-titanic)
synthea_imaging (CC-BY from https://synthea.mitre.org/downloads)
fruit (very basic example data from Polars)
survey (columns from ESS round 11 Human Values Scale questionnaire for the Netherlands)
test (columns with all supported data types)
hospital (Example electronic health record hospital dataset)
druguse (Example dataset with answers to an open question on study participants’ daily drug use)
- Parameters:
name (
str) – Name of the demo dataset.- Return type:
Path- Returns:
Path to the dataset.
References
European Social Survey European Research Infrastructure (ESS ERIC). (2024). ESS11 integrated file, edition 1.0 [Data set]. Sikt - Norwegian Agency for Shared Services in Education and Research. https://doi.org/10.21338/ess11e01_0
- metasyn.demo.dataset.register(*args)
Register a dataset so that it can be found by name.
Classes
|
Base class for demo datasets. |
|
Abstract class to define a dataset with multiple tables. |
|
Example dataset with answers to an open question on study participants' daily drug use. |
|
Very basic example data from Polars. |
|
Example electronic health record hospital dataset. |
|
An example dataset containing customers, products and purchases. |
|
CC-BY from https://www.kaggle.com/competitions/spaceship-titanic. |
|
Columns from ESS round 11 Human Values Scale questionnaire for the Netherlands. |
|
Synthetic medical health dataset from Synthea. |
|
Test dataset with all supported data types. |
|
Included in pandas, but post-processed to contain more columns. |
- class metasyn.demo.dataset.BaseDataset
Bases:
ABCBase class for demo datasets.
- class metasyn.demo.dataset.BaseMultiDataset
Bases:
BaseDatasetAbstract class to define a dataset with multiple tables.
- get_data()
Alias for get_dataframes().
- get_dataframes()
Create the dataframes (from file for example).
- Returns:
Dictionary with dataframes.
- Return type:
dataframes
- class metasyn.demo.dataset.DrugUseDataset
Bases:
BaseDatasetExample dataset with answers to an open question on study participants’ daily drug use.
This example dataset was generated through ChatGPT-4o on 07-11-2024 using the following prompt: > Create a csv with 12 rows and 2 columns: participant_id, and drug_use. The participant_id has a standard alphanumeric structure, and the drug_use contains participant’s responses on how they use drugs in their daily life.
- class metasyn.demo.dataset.FruitDataset
Bases:
BaseDatasetVery basic example data from Polars.
- class metasyn.demo.dataset.HospitalDataset
Bases:
BaseDatasetExample electronic health record hospital dataset.
This dataset was created manually by the metasyn team.
- class metasyn.demo.dataset.ShopMultiDataset
Bases:
BaseMultiDatasetAn example dataset containing customers, products and purchases.
- class metasyn.demo.dataset.SpaceShipDataset
Bases:
BaseDatasetCC-BY from https://www.kaggle.com/competitions/spaceship-titanic.
- class metasyn.demo.dataset.SurveyDataset
Bases:
BaseDatasetColumns from ESS round 11 Human Values Scale questionnaire for the Netherlands.
- class metasyn.demo.dataset.SyntheaImagingDataset
Bases:
BaseDatasetSynthetic medical health dataset from Synthea.
Jason Walonoski, Mark Kramer, Joseph Nichols, Andre Quina, Chris Moesel, Dylan Hall, Carlton Duffett, Kudakwashe Dube, Thomas Gallagher, Scott McLachlan, Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record, Journal of the American Medical Informatics Association, Volume 25, Issue 3, March 2018, Pages 230–238, https://doi.org/10.1093/jamia/ocx079
- class metasyn.demo.dataset.TestDataset
Bases:
BaseDatasetTest dataset with all supported data types.
- class metasyn.demo.dataset.TitanicDataset
Bases:
BaseDatasetIncluded in pandas, but post-processed to contain more columns.