Ciencias Exactas y Ciencias de la Salud

Permanent URI for this collectionhttps://hdl.handle.net/11285/551039

Pertenecen a esta colección Tesis y Trabajos de grado de las Maestrías correspondientes a las Escuelas de Ingeniería y Ciencias así como a Medicina y Ciencias de la Salud.

Browse

Search Results

Now showing 1 - 2 of 2
  • Item
    lmproved Diagnosis of Breast Cancer via NLP Analysis of Radiological Reports
    (Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-11) Sosa Silva, Patricia Angelli; Tamez Peña, José Gerardo; emimmayorquin; Martínez Ledezma, Emmanuel; Avendaño Davalos, Betzabeth; School of Engineering and Sciences; Campus Monterrey; Santos Díaz, Alejandro
    Toe main objective of this thesis was to evaluate the use of natural language processing (NLP) techniques and machine learning models to improve the specificity of breast cancer diagnosis and reduce false-positive rates using a dataset of radiological reports from Mexican hospitals. Toe methodology involved text preprocessing, feature extraction using NLP techniques and classification using machine learning models for the radiological reports. The preprocessing consisted of lemmatization, stop-word removal, and tokenization. Various NLP techniques were then applied, including bag-of-words, TF-IDF, Word2Vec embeddings, and Clinical­BERT embeddings. These were used as input features for classical machine learning models (Logistic Regression, Random Forest, Extreme Grading Boosting, Naive Bayes, k-Nearest Neighbors, Support Vector Machine and their ensemble) as well as a deep learning LSTM model. The models were trained, calibrated, and evaluated using metrics: AUC, accuracy, precision, recall, specificity and Fl-score. The key findings showed that the ensemble model with Bag-of-words and SVM using TF-IDF vectorized reports achieved the best performance, with an AUC of 0.79, specificity of 0.27 and AUC of 0.80 and specificity of 0.26, respec­tively. Thess model was able to identify all true positive cases while reducing the number of unnecessary biopsies by 19.49% and 15.08%, respectively. Feature importance analysis revealed that terms like "speculated", "irregular", and "4a category" were critica! for breast cancer classification. In contrast, the deep learning LSTM model performed poorly, with an AUC of only 0.52 and specificity of O. These results demonstrate the potential of NLP and machine learning techniques to enhance the reliability of breast cancer diagnosis and manage­ment, reducing the burden of unnecessary medica! procedures on patients and the healthcare system. The theoretical implications include the importance of effective feature engineering and the limitations of deep learning models for this specific task.
  • Tesis de maestría
    ECG-based heartbeat classification for arrhythmia detection: a step-by-step AI exploratory process
    (Instituto Tecnológico y de Estudios Superiores de Monterrey, 2022-12) Silva Mendez, Adrian; TAMEZ PEÑA, JOSE GERARDO; 67337; Tamez Peña, José Gerardo; emipsanchez; Gutiérrez Ruiz, Dania; Santos Díaz, Alejandro; Martínez Ledesma, Juan Emmanuel; School of Engineering and Sciences; Campus Monterrey
    This document presents the thesis of “ECG-based heartbeat classification for arrhythmia detection: A step-by-step AI Exploratory Process” for the degree of Master in Computer Science at Tecnológico de Monterrey. One of the biggest causes of death around the world (including third and first world countries) are Cardiovascular Diseases. Arrhythmia is one of those diseases in which the heart beats at an inconsistent and abnormal rhythm due to a malfunction in the electrical system of the heart. The detection, diagnosis, and classification are very challenging tasks for doctors as time is a crucial factor on the table. If it is not done in time, the patient’s life can be at risk. This proposal explores different Data Pre-processing and Feature Generation techniques to create an efficient and accurate binary classification model capable of distinguishing normal from abnormal heartbeats with an Accuracy and Sensitivity ranging in the 80-90% with a 10% increase when compared to a RAW feature vector. One of the most important ideas discussed throughout this thesis includes decomposing the ECG signal in Frequency and Time domains usingDual Tree Complex Wavelet Transform to create a Feature Vector. Another important highlight of this thesis is database manipulation, including the exclusion and the correct distribution of subjects across the training and testing sets. The approach aims to test the feature vectors by training different Supervised Learning Models including K Nearest Neighbours, Random Forest, and X-Gradient Boosting. We will be using the MIT-BIH Arrhythmia Database for the experimentation process.
En caso de no especificar algo distinto, estos materiales son compartidos bajo los siguientes términos: Atribución-No comercial-No derivadas CC BY-NC-ND http://www.creativecommons.mx/#licencias
logo

El usuario tiene la obligación de utilizar los servicios y contenidos proporcionados por la Universidad, en particular, los impresos y recursos electrónicos, de conformidad con la legislación vigente y los principios de buena fe y en general usos aceptados, sin contravenir con su realización el orden público, especialmente, en el caso en que, para el adecuado desempeño de su actividad, necesita reproducir, distribuir, comunicar y/o poner a disposición, fragmentos de obras impresas o susceptibles de estar en formato analógico o digital, ya sea en soporte papel o electrónico. Ley 23/2006, de 7 de julio, por la que se modifica el texto revisado de la Ley de Propiedad Intelectual, aprobado

DSpace software copyright © 2002-2025

Licencia