Publication: Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images
dc.contributor.author | Calderon-Ramirez, Saul | |
dc.contributor.author | Yang, Shengxiang | |
dc.contributor.author | Moemeni, Armaghan | |
dc.contributor.author | Elizondo, David | |
dc.contributor.author | Colreavy-Donnelly, Simon | |
dc.contributor.author | Chavarria-Estrada, Luis Fernando | |
dc.contributor.author | Molina-Cabello, Miguel A. | |
dc.contributor.authoraffiliation | [Calderon-Ramirez, Saul] De Montfort Univ, Ctr Computat Intelligence CCI, Leicester, Leics, England | |
dc.contributor.authoraffiliation | [Yang, Shengxiang] De Montfort Univ, Ctr Computat Intelligence CCI, Leicester, Leics, England | |
dc.contributor.authoraffiliation | [Elizondo, David] De Montfort Univ, Ctr Computat Intelligence CCI, Leicester, Leics, England | |
dc.contributor.authoraffiliation | [Colreavy-Donnelly, Simon] De Montfort Univ, Ctr Computat Intelligence CCI, Leicester, Leics, England | |
dc.contributor.authoraffiliation | [Calderon-Ramirez, Saul] Inst Tecnol Costa Rica, Cartago, Costa Rica | |
dc.contributor.authoraffiliation | [Moemeni, Armaghan] Univ Nottingham, Sch Comp Sci, Nottingham, England | |
dc.contributor.authoraffiliation | [Chavarria-Estrada, Luis Fernando] Imagenes Med Dr Chavarria Estrada, San Jose, Costa Rica | |
dc.contributor.authoraffiliation | [Molina-Cabello, Miguel A.] Univ Malaga, Dept Comp Languages & Comp Sci, Malaga, Spain | |
dc.contributor.authoraffiliation | [Molina-Cabello, Miguel A.] Inst Invest Biomed Malaga IBIMA, Malaga, Spain | |
dc.contributor.funder | European Regional Development Fund (ERDF) | |
dc.contributor.funder | Universidad de Malaga, Spain | |
dc.date.accessioned | 2023-02-12T02:20:40Z | |
dc.date.available | 2023-02-12T02:20:40Z | |
dc.date.issued | 2021-07-07 | |
dc.description.abstract | A key factor in the fight against viral diseases such as the coronavirus (COVID-19) is the identification of virus carriers as early and quickly as possible, in a cheap and efficient manner. The application of deep learning for image classification of chest X-ray images of COVID-19 patients could become a useful pre-diagnostic detection methodology. However, deep learning architectures require large labelled datasets. This is often a limitation when the subject of research is relatively new as in the case of the virus outbreak, where dealing with small labelled datasets is a challenge. Moreover, in such context, the datasets are also highly imbalanced, with few observations from positive cases of the new disease. In this work we evaluate the performance of the semi-supervised deep learning architecture known as MixMatch with a very limited number of labelled observations and highly imbalanced labelled datasets. We demonstrate the critical impact of data imbalance to the model's accuracy. Therefore, we propose a simple approach for correcting data imbalance, by re-weighting each observation in the loss function, giving a higher weight to the observations corresponding to the under-represented class. For unlabelled observations, we use the pseudo and augmented labels calculated by MixMatch to choose the appropriate weight. The proposed method improved classification accuracy by up to 18%, with respect to the non balanced MixMatch algorithm. We tested our proposed approach with several available datasets using 10, 15 and 20 labelled observations, for binary classification (COVID-19 positive and normal cases). For multi-class classification (COVID-19 positive, pneumonia and normal cases), we tested 30, 50, 70 and 90 labelled observations. Additionally, a new dataset is included among the tested datasets, composed of chest X-ray images of Costa Rican adult patients. | |
dc.description.sponsorship | This work is partially supported by the following Spanish grants: TIN2016-75097-P, RTI2018-094645-B-I00 and UMA18FEDERJA-084. All of them include funds from the European Regional Development Fund (ERDF). The authors acknowledge the funding from the Universidad de Málaga, Spain. | |
dc.description.version | Si | |
dc.identifier.citation | Calderon-Ramirez S, Yang S, Moemeni A, Elizondo D, Colreavy-Donnelly S, Chavarría-Estrada LF, et al. Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images. Appl Soft Comput. 2021 Nov;111:107692 | |
dc.identifier.doi | 10.1016/j.asoc.2021.107692 | |
dc.identifier.essn | 1872-9681 | |
dc.identifier.issn | 1568-4946 | |
dc.identifier.unpaywallURL | https://doi.org/10.1016/j.asoc.2021.107692 | |
dc.identifier.uri | http://hdl.handle.net/10668/18716 | |
dc.identifier.wosID | 724665600012 | |
dc.journal.title | Applied soft computing | |
dc.journal.titleabbreviation | Appl. soft. comput. | |
dc.language.iso | en | |
dc.organization | Instituto de Investigación Biomédica de Málaga-IBIMA | |
dc.provenance | Realizada la curación de contenido 14/08/2024 | |
dc.publisher | Elsevier | |
dc.relation.projectID | TIN2016-75097-P | |
dc.relation.projectID | RTI2018-094645-B-I00 | |
dc.relation.projectID | UMA18FEDERJA-084 | |
dc.relation.publisherversion | chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8276579/pdf/main.pdf | |
dc.rights.accessRights | open access | |
dc.subject | Coronavirus | |
dc.subject | COVID-19 | |
dc.subject | Computer aided diagnosis | |
dc.subject | Data imbalance | |
dc.subject | Semi-supervised learning | |
dc.subject | Deep | |
dc.subject | Radiology | |
dc.subject | Features | |
dc.subject.decs | Algoritmos | |
dc.subject.decs | Costa Rica | |
dc.subject.decs | Neumonía | |
dc.subject.decs | Rayos X | |
dc.subject.decs | Virosis | |
dc.subject.mesh | COVID-19 | |
dc.subject.mesh | Deep learning | |
dc.subject.mesh | X-Rays | |
dc.subject.mesh | Algorithms | |
dc.subject.mesh | Virus diseases | |
dc.subject.mesh | Pneumonia | |
dc.subject.mesh | Costa Rica | |
dc.title | Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images | |
dc.type | research article | |
dc.type.hasVersion | VoR | |
dc.volume.number | 111 | |
dc.wostype | Article | |
dspace.entity.type | Publication |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Calderon_CorrectingData.pdf
- Size:
- 1.02 MB
- Format:
- Adobe Portable Document Format