Publication:
Statistical Validation of Synthetic Data for Lung Cancer Patients Generated by Using Generative Adversarial Networks

dc.contributor.authorGonzalez-Abril, Luis
dc.contributor.authorAngulo, Cecilio
dc.contributor.authorAntonio Ortega, Juan
dc.contributor.authorLopez-Guerra, Jose-Luis
dc.contributor.authoraffiliation[Gonzalez-Abril, Luis] Univ Seville, Appl Econ & Dept, Seville 41018, Spain
dc.contributor.authoraffiliation[Angulo, Cecilio] Univ Politecn Cataluna, Intelligent Data Sci & Artificial Intelligence Re, Barcelona 08034, Spain
dc.contributor.authoraffiliation[Angulo, Cecilio] Inst Robot & Informat Ind CSIC UPC, Barcelona 08028, Spain
dc.contributor.authoraffiliation[Antonio Ortega, Juan] Univ Seville, Comp Sci Dept, Seville 41012, Spain
dc.contributor.authoraffiliation[Lopez-Guerra, Jose-Luis] Univ Hosp Virgen del Rocio, Dept Radiat Oncol, Seville 41013, Spain
dc.contributor.funderSpanish Ministry of Science, Innovation and Universities (AEI/FEDER, UE)
dc.date.accessioned2023-05-03T13:53:32Z
dc.date.available2023-05-03T13:53:32Z
dc.date.issued2022-10-01
dc.description.abstractThe development of healthcare patient digital twins in combination with machine learning technologies helps doctors in therapeutic prescription and in minimally invasive intervention procedures. The confidentiality of medical records or limited data availability in many health domains are drawbacks that can be overcome with the generation of synthetic data conformed to real data. The use of generative adversarial networks (GAN) for the generation of synthetic data of lung cancer patients has been previously introduced as a tool to solve this problem in the form of anonymized synthetic patients. However, generated synthetic data are mainly validated from the machine learning domain (loss functions) or expert domain (oncologists). In this paper, we propose statistical decision making as a validation tool: Is the model good enough to be used? Does the model pass rigorous hypothesis testing criteria? We show for the case at hand how loss functions and hypothesis validation are not always well aligned.
dc.identifier.doi10.3390/electronics11203277
dc.identifier.essn2079-9292
dc.identifier.unpaywallURLhttps://www.mdpi.com/2079-9292/11/20/3277/pdf?version=1665562658
dc.identifier.urihttp://hdl.handle.net/10668/20972
dc.identifier.wosID872609300001
dc.issue.number20
dc.journal.titleElectronics
dc.journal.titleabbreviationElectronics
dc.language.isoen
dc.organizationHospital Universitario Virgen del Rocío
dc.organizationHospital Universitario Virgen del Rocío
dc.publisherMdpi
dc.rightsAttribution 4.0 International
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectpersonalized medicine
dc.subjectgenerative adversarial network
dc.subjectlung cancer
dc.subjectvalidation tools
dc.titleStatistical Validation of Synthetic Data for Lung Cancer Patients Generated by Using Generative Adversarial Networks
dc.typeresearch article
dc.type.hasVersionVoR
dc.volume.number11
dc.wostypeArticle
dspace.entity.typePublication

Files