RT Journal Article T1 Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images A1 Calderon-Ramirez, Saul A1 Yang, Shengxiang A1 Moemeni, Armaghan A1 Elizondo, David A1 Colreavy-Donnelly, Simon A1 Chavarria-Estrada, Luis Fernando A1 Molina-Cabello, Miguel A. K1 Coronavirus K1 COVID-19 K1 Computer aided diagnosis K1 Data imbalance K1 Semi-supervised learning K1 Deep K1 Radiology K1 Features AB A key factor in the fight against viral diseases such as the coronavirus (COVID-19) is the identification of virus carriers as early and quickly as possible, in a cheap and efficient manner. The application of deep learning for image classification of chest X-ray images of COVID-19 patients could become a useful pre-diagnostic detection methodology. However, deep learning architectures require large labelled datasets. This is often a limitation when the subject of research is relatively new as in the case of the virus outbreak, where dealing with small labelled datasets is a challenge. Moreover, in such context, the datasets are also highly imbalanced, with few observations from positive cases of the new disease. In this work we evaluate the performance of the semi-supervised deep learning architecture known as MixMatch with a very limited number of labelled observations and highly imbalanced labelled datasets. We demonstrate the critical impact of data imbalance to the model's accuracy. Therefore, we propose a simple approach for correcting data imbalance, by re-weighting each observation in the loss function, giving a higher weight to the observations corresponding to the under-represented class. For unlabelled observations, we use the pseudo and augmented labels calculated by MixMatch to choose the appropriate weight. The proposed method improved classification accuracy by up to 18%, with respect to the non balanced MixMatch algorithm. We tested our proposed approach with several available datasets using 10, 15 and 20 labelled observations, for binary classification (COVID-19 positive and normal cases). For multi-class classification (COVID-19 positive, pneumonia and normal cases), we tested 30, 50, 70 and 90 labelled observations. Additionally, a new dataset is included among the tested datasets, composed of chest X-ray images of Costa Rican adult patients. PB Elsevier SN 1568-4946 YR 2021 FD 2021-07-07 LK http://hdl.handle.net/10668/18716 UL http://hdl.handle.net/10668/18716 LA en NO Calderon-Ramirez S, Yang S, Moemeni A, Elizondo D, Colreavy-Donnelly S, Chavarría-Estrada LF, et al. Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images. Appl Soft Comput. 2021 Nov;111:107692 NO This work is partially supported by the following Spanish grants: TIN2016-75097-P, RTI2018-094645-B-I00 and UMA18FEDERJA-084. All of them include funds from the European Regional Development Fund (ERDF). The authors acknowledge the funding from the Universidad de Málaga, Spain. DS RISalud RD Apr 9, 2025