Tesis de maestría / master thesis

Named entity recognition in mammography radiology reports using a multilingual transfer learning approach

Loading...
Thumbnail Image

Citation

View formats

Share

Bibliographic managers

Abstract

Breast cancer still represents an important challenge to human health all over the world. In this sense, early detection and accurate diagnosis are essential parameters for patient wel- fare. Mammography, being a major tool in this area, has resulted in an enormous amount of radiology reports that carry important diagnostic information. This large treasure, as produced by modern science, is often left unutilized due to the complexity of the text and subtleties of linguistic cues in it. This thesis aim to capitalize on the power BERT-based models bring in a bid to further increase the precision of entity recognition in these reports, being mindful of the dual challenge of fine-tuning medical text analysis and improving the diagnostic process. This thesis examines the performance of Bidirectional Encoder Representations from Transformers (BERT) based models in entity recognition of mammography radiology re- ports under precision, recall, and F1-score metrics. It involves various unsupervised, and supervised algorithms, exploring their advantages and limitations in handling medical text. The study first aims to maximize the learning potential by unfreezing all layers of the mod- els, which is theoretically beneficial. However, considering the evidence from early and rapid metric convergence, there might be overfitting problems. To address this, a more balanced approach of partial freezing of layers was experimented with, fine-tuning just the last two layers, which emerged as the best method that significantly reduced overfitting while enhancing generalization on unseen data. Indeed, a more detailed comparative analysis shows that entities referring to distinct and definite, well-characterized characteristics in the text, such as ”ASINTOMATICA” and ”BIRADS,” tend to consistently approach perfect scores. However, lower values of performance are presented by entities such as CONDUCTO DILATADO and ENTERA- MENTE ADIPOSO since they don’t occur very often and their resemblance to other terms is syntactic. This might even suggest that current BERT-based approaches are not that far from reaching the performance ceiling in this specific application, highlighted more by small performance discrepancies between the models. The best-performing models—Austin- MeDeBeRTa—have the edge on managing complex entities through specific architectural features and training regimes. Results indicate that, while BERT-based models indeed have a lot of potential for the automatic recognition of entities in clinical reports, their deployment must be fine-tuned to the specifics of the medical domain to stay robust, especially against syntax and sev- eral semantics-related issues. Future work should further refine these models to increase applicability and accuracy in a clinical setting.

Document viewer

Select a file to preview:
Reload

logo

El usuario tiene la obligación de utilizar los servicios y contenidos proporcionados por la Universidad, en particular, los impresos y recursos electrónicos, de conformidad con la legislación vigente y los principios de buena fe y en general usos aceptados, sin contravenir con su realización el orden público, especialmente, en el caso en que, para el adecuado desempeño de su actividad, necesita reproducir, distribuir, comunicar y/o poner a disposición, fragmentos de obras impresas o susceptibles de estar en formato analógico o digital, ya sea en soporte papel o electrónico. Ley 23/2006, de 7 de julio, por la que se modifica el texto revisado de la Ley de Propiedad Intelectual, aprobado

Licencia