lmproved Diagnosis of Breast Cancer via NLP Analysis of Radiological Reports

dc.audience.educationlevelMaestros/Teachers
dc.audience.educationlevelEstudiantes/Students
dc.audience.educationlevelInvestigadores/Researchers
dc.audience.educationlevelOtros/Other
dc.contributor.advisorTamez Peña, José Gerardo
dc.contributor.authorSosa Silva, Patricia Angelli
dc.contributor.catalogeremimmayorquin
dc.contributor.committeememberMartínez Ledezma, Emmanuel
dc.contributor.committeememberAvendaño Davalos, Betzabeth
dc.contributor.departmentSchool of Engineering and Sciences
dc.contributor.institutionCampus Monterrey
dc.contributor.mentorSantos Díaz, Alejandro
dc.date.accepted2024-11
dc.date.accessioned2025-01-16T16:52:33Z
dc.date.issued2024-11
dc.description.abstractToe main objective of this thesis was to evaluate the use of natural language processing (NLP) techniques and machine learning models to improve the specificity of breast cancer diagnosis and reduce false-positive rates using a dataset of radiological reports from Mexican hospitals. Toe methodology involved text preprocessing, feature extraction using NLP techniques and classification using machine learning models for the radiological reports. The preprocessing consisted of lemmatization, stop-word removal, and tokenization. Various NLP techniques were then applied, including bag-of-words, TF-IDF, Word2Vec embeddings, and Clinical­BERT embeddings. These were used as input features for classical machine learning models (Logistic Regression, Random Forest, Extreme Grading Boosting, Naive Bayes, k-Nearest Neighbors, Support Vector Machine and their ensemble) as well as a deep learning LSTM model. The models were trained, calibrated, and evaluated using metrics: AUC, accuracy, precision, recall, specificity and Fl-score. The key findings showed that the ensemble model with Bag-of-words and SVM using TF-IDF vectorized reports achieved the best performance, with an AUC of 0.79, specificity of 0.27 and AUC of 0.80 and specificity of 0.26, respec­tively. Thess model was able to identify all true positive cases while reducing the number of unnecessary biopsies by 19.49% and 15.08%, respectively. Feature importance analysis revealed that terms like "speculated", "irregular", and "4a category" were critica! for breast cancer classification. In contrast, the deep learning LSTM model performed poorly, with an AUC of only 0.52 and specificity of O. These results demonstrate the potential of NLP and machine learning techniques to enhance the reliability of breast cancer diagnosis and manage­ment, reducing the burden of unnecessary medica! procedures on patients and the healthcare system. The theoretical implications include the importance of effective feature engineering and the limitations of deep learning models for this specific task.
dc.description.degreeMaster of Science in Computer Science
dc.format.mediumTexto
dc.identificator120318
dc.identifier.citationPatricia Angelli, S. S. (2024). lmproved Diagnosis of Breast Cancer via NLP Analysis of Radiological Reports. [Tesis maestria]. Instituto Tecnológico y de Estudios Superiores de Monterrey.
dc.identifier.urihttps://hdl.handle.net/11285/703058
dc.language.isoeng
dc.publisherInstituto Tecnológico y de Estudios Superiores de Monterrey
dc.relationInstituto Tecnológico de Estudios Superiores de Monterrey
dc.relationCONAHCYT
dc.rightsopenAccess
dc.rights.urihttp://creativecommons.org/licenses/by/4.0
dc.subject.classificationINGENIERÍA Y TECNOLOGÍA::CIENCIAS TECNOLÓGICAS::TECNOLOGÍA DE LOS ORDENADORES::SISTEMAS DE INFORMACIÓN, DISEÑO Y COMPONENTES
dc.subject.keywordNatural language processing (NLP)
dc.subject.keywordLearning models
dc.subject.keywordBreast cancer
dc.subject.lcshTechnology
dc.subject.lcshMedicine
dc.titlelmproved Diagnosis of Breast Cancer via NLP Analysis of Radiological Reports

Files

Original bundle

Now showing 1 - 4 of 4
Loading...
Thumbnail Image
Name:
Patricia Angelli Sosa Silva Tesis.pdf
Size:
23.6 MB
Format:
Adobe Portable Document Format
Description:
Tesis
Loading...
Thumbnail Image
Name:
Patricia Angelli Sosa Silva Carta Autorización.pdf
Size:
158.26 KB
Format:
Adobe Portable Document Format
Description:
Carta Autorización
Loading...
Thumbnail Image
Name:
Patricia Angelli Sosa Silva Acta de Grado.pdf
Size:
498.89 KB
Format:
Adobe Portable Document Format
Description:
Acta de Grado.
Loading...
Thumbnail Image
Name:
Sosa SilvaPatricia Angelli Firma de carta autoridad.pdf
Size:
249.25 KB
Format:
Adobe Portable Document Format
Description:
Firma de carta autoridad

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.28 KB
Format:
Item-specific license agreed upon to submission
Description:
logo

El usuario tiene la obligación de utilizar los servicios y contenidos proporcionados por la Universidad, en particular, los impresos y recursos electrónicos, de conformidad con la legislación vigente y los principios de buena fe y en general usos aceptados, sin contravenir con su realización el orden público, especialmente, en el caso en que, para el adecuado desempeño de su actividad, necesita reproducir, distribuir, comunicar y/o poner a disposición, fragmentos de obras impresas o susceptibles de estar en formato analógico o digital, ya sea en soporte papel o electrónico. Ley 23/2006, de 7 de julio, por la que se modifica el texto revisado de la Ley de Propiedad Intelectual, aprobado

DSpace software copyright © 2002-2025

Licencia