Publication:
The Health Gym: synthetic health-related datasets for the development of reinforcement learning algorithms.

dc.contributor.authorKuo, Nicholas I-Hsien
dc.contributor.authorPolizzotto, Mark N
dc.contributor.authorFinfer, Simon
dc.contributor.authorGarcia, Federico
dc.contributor.authorSönnerborg, Anders
dc.contributor.authorZazzi, Maurizio
dc.contributor.authorBöhm, Michael
dc.contributor.authorKaiser, Rolf
dc.contributor.authorJorm, Louisa
dc.contributor.authorBarbieri, Sebastiano
dc.date.accessioned2023-05-03T13:26:36Z
dc.date.available2023-05-03T13:26:36Z
dc.date.issued2022-11-11
dc.description.abstractIn recent years, the machine learning research community has benefited tremendously from the availability of openly accessible benchmark datasets. Clinical data are usually not openly available due to their confidential nature. This has hampered the development of reproducible and generalisable machine learning applications in health care. Here we introduce the Health Gym - a growing collection of highly realistic synthetic medical datasets that can be freely accessed to prototype, evaluate, and compare machine learning algorithms, with a specific focus on reinforcement learning. The three synthetic datasets described in this paper present patient cohorts with acute hypotension and sepsis in the intensive care unit, and people with human immunodeficiency virus (HIV) receiving antiretroviral therapy. The datasets were created using a novel generative adversarial network (GAN). The distributions of variables, and correlations between variables and trends in variables over time in the synthetic datasets mirror those in the real datasets. Furthermore, the risk of sensitive information disclosure associated with the public distribution of the synthetic datasets is estimated to be very low.
dc.identifier.doi10.1038/s41597-022-01784-7
dc.identifier.essn2052-4463
dc.identifier.pmcPMC9652426
dc.identifier.pmid36369205
dc.identifier.pubmedURLhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9652426/pdf
dc.identifier.unpaywallURLhttps://www.nature.com/articles/s41597-022-01784-7.pdf
dc.identifier.urihttp://hdl.handle.net/10668/19578
dc.issue.number1
dc.journal.titleScientific data
dc.journal.titleabbreviationSci Data
dc.language.isoen
dc.organizationHospital Universitario San Cecilio
dc.page.number693
dc.pubmedtypeDataset
dc.pubmedtypeJournal Article
dc.rightsAttribution 4.0 International
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subject.meshHumans
dc.subject.meshAlgorithms
dc.subject.meshMachine Learning
dc.subject.meshComprehensive Health Care
dc.titleThe Health Gym: synthetic health-related datasets for the development of reinforcement learning algorithms.
dc.typeresearch article
dc.type.hasVersionVoR
dc.volume.number9
dspace.entity.typePublication

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
PMC9652426.pdf
Size:
8.86 MB
Format:
Adobe Portable Document Format