Publication:
Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images

dc.contributor.authorCalderon-Ramirez, Saul
dc.contributor.authorYang, Shengxiang
dc.contributor.authorMoemeni, Armaghan
dc.contributor.authorElizondo, David
dc.contributor.authorColreavy-Donnelly, Simon
dc.contributor.authorChavarria-Estrada, Luis Fernando
dc.contributor.authorMolina-Cabello, Miguel A.
dc.contributor.authoraffiliation[Calderon-Ramirez, Saul] De Montfort Univ, Ctr Computat Intelligence CCI, Leicester, Leics, England
dc.contributor.authoraffiliation[Yang, Shengxiang] De Montfort Univ, Ctr Computat Intelligence CCI, Leicester, Leics, England
dc.contributor.authoraffiliation[Elizondo, David] De Montfort Univ, Ctr Computat Intelligence CCI, Leicester, Leics, England
dc.contributor.authoraffiliation[Colreavy-Donnelly, Simon] De Montfort Univ, Ctr Computat Intelligence CCI, Leicester, Leics, England
dc.contributor.authoraffiliation[Calderon-Ramirez, Saul] Inst Tecnol Costa Rica, Cartago, Costa Rica
dc.contributor.authoraffiliation[Moemeni, Armaghan] Univ Nottingham, Sch Comp Sci, Nottingham, England
dc.contributor.authoraffiliation[Chavarria-Estrada, Luis Fernando] Imagenes Med Dr Chavarria Estrada, San Jose, Costa Rica
dc.contributor.authoraffiliation[Molina-Cabello, Miguel A.] Univ Malaga, Dept Comp Languages & Comp Sci, Malaga, Spain
dc.contributor.authoraffiliation[Molina-Cabello, Miguel A.] Inst Invest Biomed Malaga IBIMA, Malaga, Spain
dc.contributor.funderEuropean Regional Development Fund (ERDF)
dc.contributor.funderUniversidad de Malaga, Spain
dc.date.accessioned2023-02-12T02:20:40Z
dc.date.available2023-02-12T02:20:40Z
dc.date.issued2021-07-07
dc.description.abstractA key factor in the fight against viral diseases such as the coronavirus (COVID-19) is the identification of virus carriers as early and quickly as possible, in a cheap and efficient manner. The application of deep learning for image classification of chest X-ray images of COVID-19 patients could become a useful pre-diagnostic detection methodology. However, deep learning architectures require large labelled datasets. This is often a limitation when the subject of research is relatively new as in the case of the virus outbreak, where dealing with small labelled datasets is a challenge. Moreover, in such context, the datasets are also highly imbalanced, with few observations from positive cases of the new disease. In this work we evaluate the performance of the semi-supervised deep learning architecture known as MixMatch with a very limited number of labelled observations and highly imbalanced labelled datasets. We demonstrate the critical impact of data imbalance to the model's accuracy. Therefore, we propose a simple approach for correcting data imbalance, by re-weighting each observation in the loss function, giving a higher weight to the observations corresponding to the under-represented class. For unlabelled observations, we use the pseudo and augmented labels calculated by MixMatch to choose the appropriate weight. The proposed method improved classification accuracy by up to 18%, with respect to the non balanced MixMatch algorithm. We tested our proposed approach with several available datasets using 10, 15 and 20 labelled observations, for binary classification (COVID-19 positive and normal cases). For multi-class classification (COVID-19 positive, pneumonia and normal cases), we tested 30, 50, 70 and 90 labelled observations. Additionally, a new dataset is included among the tested datasets, composed of chest X-ray images of Costa Rican adult patients.
dc.description.sponsorshipThis work is partially supported by the following Spanish grants: TIN2016-75097-P, RTI2018-094645-B-I00 and UMA18FEDERJA-084. All of them include funds from the European Regional Development Fund (ERDF). The authors acknowledge the funding from the Universidad de Málaga, Spain.
dc.description.versionSi
dc.identifier.citationCalderon-Ramirez S, Yang S, Moemeni A, Elizondo D, Colreavy-Donnelly S, Chavarría-Estrada LF, et al. Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images. Appl Soft Comput. 2021 Nov;111:107692
dc.identifier.doi10.1016/j.asoc.2021.107692
dc.identifier.essn1872-9681
dc.identifier.issn1568-4946
dc.identifier.unpaywallURLhttps://doi.org/10.1016/j.asoc.2021.107692
dc.identifier.urihttp://hdl.handle.net/10668/18716
dc.identifier.wosID724665600012
dc.journal.titleApplied soft computing
dc.journal.titleabbreviationAppl. soft. comput.
dc.language.isoen
dc.organizationInstituto de Investigación Biomédica de Málaga-IBIMA
dc.provenanceRealizada la curación de contenido 14/08/2024
dc.publisherElsevier
dc.relation.projectIDTIN2016-75097-P
dc.relation.projectIDRTI2018-094645-B-I00
dc.relation.projectIDUMA18FEDERJA-084
dc.relation.publisherversionchrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8276579/pdf/main.pdf
dc.rights.accessRightsopen access
dc.subjectCoronavirus
dc.subjectCOVID-19
dc.subjectComputer aided diagnosis
dc.subjectData imbalance
dc.subjectSemi-supervised learning
dc.subjectDeep
dc.subjectRadiology
dc.subjectFeatures
dc.subject.decsAlgoritmos
dc.subject.decsCosta Rica
dc.subject.decsNeumonía
dc.subject.decsRayos X
dc.subject.decsVirosis
dc.subject.meshCOVID-19
dc.subject.meshDeep learning
dc.subject.meshX-Rays
dc.subject.meshAlgorithms
dc.subject.meshVirus diseases
dc.subject.meshPneumonia
dc.subject.meshCosta Rica
dc.titleCorrecting data imbalance for semi-supervised COVID-19 detection using X-ray chest images
dc.typeresearch article
dc.type.hasVersionVoR
dc.volume.number111
dc.wostypeArticle
dspace.entity.typePublication

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Calderon_CorrectingData.pdf
Size:
1.02 MB
Format:
Adobe Portable Document Format