Machine Learning Applied to Clinical Laboratory Data in Spain for COVID-19 Outcome Prediction: Model Development and Validation.

Domínguez-Olmedo, Juan L; Gragera-Martínez, Álvaro; Mata, Jacinto; Pachón Álvarez, Victoria

Publication:
Machine Learning Applied to Clinical Laboratory Data in Spain for COVID-19 Outcome Prediction: Model Development and Validation.

Identifiers

URI: http://hdl.handle.net/10668/17434

DOI: 10.2196/26211

Date

2021-04-14

Authors

Domínguez-Olmedo, Juan L

Gragera-Martínez, Álvaro

Mata, Jacinto

Pachón Álvarez, Victoria

Metrics

Export

Abstract

The COVID-19 pandemic is probably the greatest health catastrophe of the modern era. Spain's health care system has been exposed to uncontrollable numbers of patients over a short period, causing the system to collapse. Given that diagnosis is not immediate, and there is no effective treatment for COVID-19, other tools have had to be developed to identify patients at the risk of severe disease complications and thus optimize material and human resources in health care. There are no tools to identify patients who have a worse prognosis than others. This study aimed to process a sample of electronic health records of patients with COVID-19 in order to develop a machine learning model to predict the severity of infection and mortality from among clinical laboratory parameters. Early patient classification can help optimize material and human resources, and analysis of the most important features of the model could provide more detailed insights into the disease. After an initial performance evaluation based on a comparison with several other well-known methods, the extreme gradient boosting algorithm was selected as the predictive method for this study. In addition, Shapley Additive Explanations was used to analyze the importance of the features of the resulting model. After data preprocessing, 1823 confirmed patients with COVID-19 and 32 predictor features were selected. On bootstrap validation, the extreme gradient boosting classifier yielded a value of 0.97 (95% CI 0.96-0.98) for the area under the receiver operator characteristic curve, 0.86 (95% CI 0.80-0.91) for the area under the precision-recall curve, 0.94 (95% CI 0.92-0.95) for accuracy, 0.77 (95% CI 0.72-0.83) for the F-score, 0.93 (95% CI 0.89-0.98) for sensitivity, and 0.91 (95% CI 0.86-0.96) for specificity. The 4 most relevant features for model prediction were lactate dehydrogenase activity, C-reactive protein levels, neutrophil counts, and urea levels. Our predictive model yielded excellent results in the differentiating among patients who died of COVID-19, primarily from among laboratory parameter values. Analysis of the resulting model identified a set of features with the most significant impact on the prediction, thus relating them to a higher risk of mortality.

MeSH Terms

Adolescent
Adult
Aged
Aged, 80 and over
COVID-19
Child
Child, Preschool
Female
Humans
Infant
Infant, Newborn
Laboratories
Machine Learning
Male
Middle Aged
Pandemics
Prognosis
Reproducibility of Results
Research Design
Retrospective Studies
SARS-CoV-2
Spain
Treatment Outcome
Young Adult

Keywords

COVID-19, electronic health record, machine learning, mortality, prediction

Collections

SAS - Hospital Universitario Juan Ramón Jiménez

Full item page

Publication:
Machine Learning Applied to Clinical Laboratory Data in Spain for COVID-19 Outcome Prediction: Model Development and Validation.

Identifiers

Date

Authors

Advisors

Journal Title

Journal ISSN

Volume Title

Publisher

Metrics

Export

Research Projects

Organizational Units

Journal Issue

Abstract

Description

MeSH Terms

DeCS Terms

CIE Terms

Keywords

Citation

Collections

Publication: Machine Learning Applied to Clinical Laboratory Data in Spain for COVID-19 Outcome Prediction: Model Development and Validation.

Identifiers

Date

Authors

Advisors

Journal Title

Journal ISSN

Volume Title

Publisher

Metrics

Export

Research Projects

Organizational Units

Journal Issue

Abstract

Description

MeSH Terms

DeCS Terms

CIE Terms

Keywords

Citation

Collections

Publication:
Machine Learning Applied to Clinical Laboratory Data in Spain for COVID-19 Outcome Prediction: Model Development and Validation.