Ciencias Exactas y Ciencias de la Salud
Permanent URI for this collectionhttps://hdl.handle.net/11285/551039
Pertenecen a esta colección Tesis y Trabajos de grado de las Maestrías correspondientes a las Escuelas de Ingeniería y Ciencias así como a Medicina y Ciencias de la Salud.
Browse
Search Results
- lmproved Diagnosis of Breast Cancer via NLP Analysis of Radiological Reports(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-11) Sosa Silva, Patricia Angelli; Tamez Peña, José Gerardo; emimmayorquin; Martínez Ledezma, Emmanuel; Avendaño Davalos, Betzabeth; School of Engineering and Sciences; Campus Monterrey; Santos Díaz, AlejandroToe main objective of this thesis was to evaluate the use of natural language processing (NLP) techniques and machine learning models to improve the specificity of breast cancer diagnosis and reduce false-positive rates using a dataset of radiological reports from Mexican hospitals. Toe methodology involved text preprocessing, feature extraction using NLP techniques and classification using machine learning models for the radiological reports. The preprocessing consisted of lemmatization, stop-word removal, and tokenization. Various NLP techniques were then applied, including bag-of-words, TF-IDF, Word2Vec embeddings, and ClinicalBERT embeddings. These were used as input features for classical machine learning models (Logistic Regression, Random Forest, Extreme Grading Boosting, Naive Bayes, k-Nearest Neighbors, Support Vector Machine and their ensemble) as well as a deep learning LSTM model. The models were trained, calibrated, and evaluated using metrics: AUC, accuracy, precision, recall, specificity and Fl-score. The key findings showed that the ensemble model with Bag-of-words and SVM using TF-IDF vectorized reports achieved the best performance, with an AUC of 0.79, specificity of 0.27 and AUC of 0.80 and specificity of 0.26, respectively. Thess model was able to identify all true positive cases while reducing the number of unnecessary biopsies by 19.49% and 15.08%, respectively. Feature importance analysis revealed that terms like "speculated", "irregular", and "4a category" were critica! for breast cancer classification. In contrast, the deep learning LSTM model performed poorly, with an AUC of only 0.52 and specificity of O. These results demonstrate the potential of NLP and machine learning techniques to enhance the reliability of breast cancer diagnosis and management, reducing the burden of unnecessary medica! procedures on patients and the healthcare system. The theoretical implications include the importance of effective feature engineering and the limitations of deep learning models for this specific task.
- Automated radiology report generation using radiomics and natural language processing techniques(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-01) Bosques Palomo, Beatriz Alejandra; Tamez Peña, José Gerardo; emipsanchez; Santos Díaz, Alejandro; Avendaño Avalos, Daly Betzabeth; Helguera, Maria; Escuela de Ingeniería y Ciencias; Campus MonterreyThis thesis addresses the significant challenges in breast cancer diagnosis in developing countries, where delayed follow-ups due to resource constraints can impede timely and accurate detection, affecting patient outcomes. A novel approach using radiomic features integrated with transformer models to automate mammography report generation, specifically focusing on report conclusions is proposed. The primary goal is to assess if these AI-driven models can replicate the diagnostic accuracy of expert radiologists in assigning BI-RADS categories and recommending follow-ups or biopsies. The study begins with meticulous image preprocessing, including a customized histogram matching scheme to standardize input data and reduce variability among images from different vendors. Radiomic features were then extracted and validated through a classification task obtaining an AUC of 0.81, proving their efficacy as inputs for the transformer architecture. The transformer models utilized both radiomic features and deep learning features extracted via a pretrained CNN. This approach allowed for a direct comparison of model performance between the hand-crafted radiomic inputs and the more complex deep learning features against expert evaluations. Results showed that the models reached high agreement with radiologists’ evaluations, with kappa values reaching up to 0.93 for the simpler BI-RADS categorization task (1 & 5) using deep learning features. However, performance declined in more complex cases, with kappa values dropping to 0.23 for radiomic features across all BI-RADS categories (1, 2, 3, 4 & 5), indicating only fair agreement. In contrast, deep learning features maintained a moderate agreement with a kappa of 0.41. Despite these promising results, the study acknowledges certain limitations, including the inability to fine-tune feature extraction due to the hand-crafted nature of radiomic features, as well as the potential subjectivity in the data, given that radiologist evaluations are susceptible to human error. Nonetheless, this research lays crucial groundwork for future AI advancements in radiological diagnostics, aiming to enhance the efficiency, accuracy, and comprehensiveness of medical image analysis in resource-limited settings.

