Publication:
Transformers for Clinical Coding in Spanish

dc.contributor.authorLópez-García, Guillermo
dc.contributor.authorJerez, José M.
dc.contributor.authorRibelles, Nuria
dc.contributor.authorAlba, Emilio
dc.contributor.authorVeredas, Francisco J.
dc.contributor.authoraffiliation[López-García,G; Jerez,JM; Veredas,FJ] Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, Málaga, Spain. [Ribelles,N; Alba,E] Unidad de Gestión Clínica Intercentros de Oncología, Instituto de Investigación Biomédica de Málaga (IBIMA), Hospitales Universitarios Regional y Virgen de la Victoria, Málaga, Spain.
dc.contributor.funderThis work was supported in part by the Ministerio de Economía y Empresa (MINECO), Plan Nacional de I+D+I, under Project TIN2017-88728-C2-1-R, in part by the Andalucía TECH, under Project UMA-CEIATECH-01, in part by the Universidad de Málaga and the Consorcio de Bibliotecas Universitarias de Andalucía (CBUA), and in part by the Plan Andaluz de Investigación, Desarrollo e Innovación (PAIDI), Junta de Andalucía.
dc.date.accessioned2022-07-28T07:06:11Z
dc.date.available2022-07-28T07:06:11Z
dc.date.issued2021-05-13
dc.description.abstractAutomatic clinical coding is an essential task in the process of extracting relevant information from unstructured documents contained in electronic health records (EHRs). However, most research in the development of computer-based methods for clinical coding focuses on texts written in English due to the limited availability of medical linguistic resources in languages other than English. With nearly 500 million native speakers, there is a worldwide interest in processing healthcare texts in Spanish. In this study, we sys tematically analyzed transformer-based models for automatic clinical coding in Spanish. Using a transfer learning-based approach, the three existing transformer architectures that support the Spanish language, namely, multilingual BERT (mBERT), BETO and XLM-RoBERTa (XLM-R), were first pretrained on a corpus of real-world oncology clinical cases with the goal of adapting transformers to the particularities of Spanish medical texts. The resulting models were fine-tuned on three distinct clinical coding tasks, following a multilabel sentence classification strategy. For each analyzed transformer, the domain-specific version out performed the original general domain model across those tasks. Moreover, the combination of the developed strategy with an ensemble approach leveraging the predictive capacities of the three distinct transformers yielded the best obtained results, with MAP scores of 0.662, 0.544 and 0.884 on CodiEsp-D, CodiEsp-P and Cantemist-Coding shared tasks, which remarkably improved the previous state-of-the-art performance by 11.6%, 10.3% and 4.4%, respectively. We publicly release the mBERT, BETO and XLMR transform ers adapted to the Spanish clinical domain at https://github.com/guilopgar/ClinicalCodingTransformerES, providing the clinical natural language processing community with advanced deep learning methods for performing medical coding and other tasks in the Spanish clinical domain.es_ES
dc.description.versionYeses_ES
dc.identifier.citationLópez-García G, Jerez JM, Ribelles N, Alba E, Veredas FJ. Transformers for Clinical Coding in Spanish. IEEE Access. 2021;9:72387-72397es_ES
dc.identifier.doi10.1109/ACCESS.2021.3080085es_ES
dc.identifier.essn2169-3536
dc.identifier.urihttp://hdl.handle.net/10668/3836
dc.journal.titleIEEE Access
dc.language.isoen
dc.page.number11 p.
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)es_ES
dc.relation.publisherversionhttps://ieeexplore.ieee.org/document/9430499es_ES
dc.rightsAtribución 4.0 Internacional*
dc.rightsAtribución 4.0 Internacional*
dc.rightsAtribución 4.0 Internacional*
dc.rightsAtribución 4.0 Internacional*
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.subjectClinical codinges_ES
dc.subjectDeep learninges_ES
dc.subjectNatural language processinges_ES
dc.subjectText classificationes_ES
dc.subjectTransformerses_ES
dc.subjectCodificación clínicaes_ES
dc.subjectAprendizaje profundoes_ES
dc.subjectProcesamiento de lenguaje naturales_ES
dc.subjectModelos epidemiológicoses_ES
dc.subjectAnálisis y desempeño de tareases_ES
dc.subject.meshMedical Subject Headings::Information Science::Information Science::Computing Methodologies::Artificial Intelligence::Natural Language Processinges_ES
dc.subject.meshMedical Subject Headings::Analytical, Diagnostic and Therapeutic Techniques and Equipment::Investigative Techniques::Epidemiologic Methods::Data Collection::Records as Topic::Medical Records::Medical Records Systems, Computerized::Electronic Health Recordses_ES
dc.subject.meshMedical Subject Headings::Health Care::Health Services Administration::Organization and Administration::Professional Practice::Practice Management::Office Management::Forms and Records Control::Clinical Codinges_ES
dc.subject.meshMedical Subject Headings::Information Science::Information Science::Communication::Language::Linguisticses_ES
dc.subject.meshMedical Subject Headings::Health Care::Health Care Quality, Access, and Evaluation::Delivery of Health Carees_ES
dc.subject.meshMedical Subject Headings::Information Science::Information Science::Computing Methodologies::Computer Systems::Computerses_ES
dc.subject.meshMedical Subject Headings::Psychiatry and Psychology::Behavior and Behavior Mechanisms::Motivation::Goalses_ES
dc.subject.meshMedical Subject Headings::Psychiatry and Psychology::Psychological Phenomena and Processes::Psychology, Applied::Human Engineering::Task Performance and Analysises_ES
dc.titleTransformers for Clinical Coding in Spanishes_ES
dc.typeresearch article
dc.type.hasVersionVoR
dspace.entity.typePublication

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
LopezGarcia_TransformersFor.pdf
Size:
1.5 MB
Format:
Adobe Portable Document Format
Description:
Artículo original
Loading...
Thumbnail Image
Name:
LopezGarcia_TransformersFor_MaterialSuplementario.pdf
Size:
109.8 KB
Format:
Adobe Portable Document Format
Description:
Material suplementario