RT Journal Article T1 Transformers for Clinical Coding in Spanish A1 López-García, Guillermo A1 Jerez, José M. A1 Ribelles, Nuria A1 Alba, Emilio A1 Veredas, Francisco J. K1 Clinical coding K1 Deep learning K1 Natural language processing K1 Text classification K1 Transformers K1 Codificación clínica K1 Aprendizaje profundo K1 Procesamiento de lenguaje natural K1 Modelos epidemiológicos K1 Análisis y desempeño de tareas AB Automatic clinical coding is an essential task in the process of extracting relevant information from unstructured documents contained in electronic health records (EHRs). However, most research in the development of computer-based methods for clinical coding focuses on texts written in English due to the limited availability of medical linguistic resources in languages other than English. With nearly 500 million native speakers, there is a worldwide interest in processing healthcare texts in Spanish. In this study, we sys tematically analyzed transformer-based models for automatic clinical coding in Spanish. Using a transfer learning-based approach, the three existing transformer architectures that support the Spanish language, namely, multilingual BERT (mBERT), BETO and XLM-RoBERTa (XLM-R), were first pretrained on a corpus of real-world oncology clinical cases with the goal of adapting transformers to the particularities of Spanish medical texts. The resulting models were fine-tuned on three distinct clinical coding tasks, following a multilabel sentence classification strategy. For each analyzed transformer, the domain-specific version out performed the original general domain model across those tasks. Moreover, the combination of the developed strategy with an ensemble approach leveraging the predictive capacities of the three distinct transformers yielded the best obtained results, with MAP scores of 0.662, 0.544 and 0.884 on CodiEsp-D, CodiEsp-P and Cantemist-Coding shared tasks, which remarkably improved the previous state-of-the-art performance by 11.6%, 10.3% and 4.4%, respectively. We publicly release the mBERT, BETO and XLMR transform ers adapted to the Spanish clinical domain at https://github.com/guilopgar/ClinicalCodingTransformerES, providing the clinical natural language processing community with advanced deep learning methods for performing medical coding and other tasks in the Spanish clinical domain. PB Institute of Electrical and Electronics Engineers (IEEE) YR 2021 FD 2021-05-13 LK http://hdl.handle.net/10668/3836 UL http://hdl.handle.net/10668/3836 LA en NO López-García G, Jerez JM, Ribelles N, Alba E, Veredas FJ. Transformers for Clinical Coding in Spanish. IEEE Access. 2021;9:72387-72397 DS RISalud RD Aug 7, 2025