Ciencias Exactas y Ciencias de la Salud
Permanent URI for this collectionhttps://hdl.handle.net/11285/551039
Pertenecen a esta colección Tesis y Trabajos de grado de las Maestrías correspondientes a las Escuelas de Ingeniería y Ciencias así como a Medicina y Ciencias de la Salud.
Browse
Search Results
- Improving deep neural networks to identify depression using neural architecture search(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2025-06) Hernández Silva, Erick; Trejo Rodríguez, Luis Ángel; emipsanchez; Cantoral Ceballos, José Antonio; González Mendoza, Miguel; School of Engineering and Sciences; Campus Estado de México; Sosa Hernández, Víctor AdriánA Neural Architecture Search (NAS) framework utilizing Evolutionary Algorithms (EAs) and a regressor model is proposed to improve the classification performance of Deep Neural Net- works (DNNs) for the early detection of Major Depressive Disorder (MDD) from speech data represented by Mel-Spectrograms. The framework automates the design of neural network architectures by systematically exploring a well-defined search space that integrates convo- lutional layers, batch normalization, dropout, max pooling, and self-attention mechanisms, aiming to capture both spatial and temporal features inherent in vocal signals. By optimiz- ing for the F1-score, the framework addresses challenges related to data imbalance, ensuring robust generalization across both depressed and non-depressed samples. The proposed approach employs an integer-based encoding scheme to represent candi- date architectures, coupled with repair and validation processes that ensure all architectures meet specific design constraints. A self-adaptive mechanism dynamically adjusts the muta- tion factor based on evolutionary feedback, improving the balance between exploration and exploitation during the search process. Furthermore, a surrogate model, built using Princi- pal Component Analysis (PCA) and XGBoost regressor, predicts architecture performance, significantly reducing computational costs by avoiding full model training for all candidates. Experimental validation, conducted on publicly available speech datasets, demonstrates that NAS-generated architectures may outperform manually designed state-of-the-art models in terms of F1-score, accuracy, precision, recall, and specificity. The results confirm the effec- tiveness of integrating self-attention mechanisms with convolutional operations for extracting relevant depression-related vocal biomarkers. This research underlines the potential of NAS in advancing non-invasive, scalable, and interpretable AI-driven tools for mental health as- sessment, contributing to early intervention strategies and improving clinical outcomes in depression diagnosis.
- Multimodal neuroimaging and explainable deep learning for characterizing brain aging: insights into biomarkers of healthy and pathological aging(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2025-05) Cárdenas Castro, Héctor Manuel; Cantoral Ceballos, José Antonio; emipsanchez; Trejo Rodríguez, Luis Ángel; Castañeda Miranda, Alejandro; School of Engineering and Sciences; Campus Monterrey; Caraza Camacho, RicardoThe aging brain undergoes complex structural and functional transformations that differ- entiate healthy aging from pathological trajectories such as dementia. This study pioneers a multimodal neuroimaging and explainable deep learning framework to characterize brain aging, identify biomarkers of neurodegeneration, and elucidate the interplay between local anatomical changes and global network reorganization. Leveraging structural MRI-derived volumetrics and graph theory-based connectivity metrics extracted from resting-state fMRI from a heterogeneous cohort of cognitively healthy individuals and patients with Dementia attributed to Alzheimer’s and non-Alzheimer’s Disease, two predictive models were devel- oped: (1) a brain-age regression model to quantify deviations from normative aging patterns and (2) a dementia classification model to distinguish pathological from healthy aging. Both models achieved robust performance (mean absolute error = 0.68 years for controls in re- gression; F1-score = 0.93 for classification), with interpretable feature contributions revealed through SHAP (SHapley Additive exPlanations) analyses. Explainable AI (SHAP) analyses revealed non-linear feature interactions and highlighted established and novel neuroanatom- ical correlates of brain aging and dementia. By synthesizing computational innovation with clinical neuroimaging, this research provides actionable biomarkers for aging research, re- fines the conceptual framework of compensatory brain reorganization, and establishes a new contribution for AI-driven precision diagnostics in neurodegenerative disorders.
- Voice fraud mitigation: developing a deep learning system for detecting cloned voices in telephonic communications(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-12-03) Villicaña Ibargüengoyti, José Rubén; Montesinos Silva, Luis Arturo; emimmayorquin; Santos Díaz, Alejandro; Mantilla Caeiros, Alfredo Víctor; School of Engineering and Sciences; Campus Ciudad de MéxicoThis study addresses the increasing threat in recent years of voice fraud by cloned voices in phone calls. This problem can compromise personal security in many aspects. The primary goal of this work is to develop a deep learning-based detection system for distinguishing between real and cloned voices in Spanish, focusing on calls made over telephone lines. To achieve this, a dataset was generated from real and cloned audio samples in Spanish. The audios captured were simulated under various telephone codecs and noise levels. Two deep learning models, a convolutional neural network (which in this project is named Vanilla CNN) and a transfer learning (MobileNetV2) approach, were trained using spectrograms derived from the audio data. The results indicate a high accuracy in identifying real and cloned voices, reaching up to 99.97% accuracy. Also, many validations were performed under different types of noise and codecs included in the dataset. These findings highlight the effectiveness of the proposed architectures. Additionally, an ESP32 audio kit was integrated with Amazon Web Services to implement voice detection during phone calls. This study contributes to voice fraud detection research focused on the Spanish language.
- Object detection-based surgical instrument tracking in laparoscopy videos(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-12) Guerrero Ramírez, Cuauhtemoc Alonso; Ochoa Ruiz, Gilberto; emipsanchez; González Mendoza, Miguel; Hinojosa Cervantes, Salvador Miguel; Falcón Morales, Luis Eduardo; School of Engineering and Sciences; Campus Monterrey; Medina Pérez, Miguel ÁngelMinimally invasive surgery (MIS) has transformed surgery by offering numerous advantages over traditional open surgery, such as reduced pain, minimized trauma, and faster recovery times. However, endoscopic MIS procedures remain highly operator-dependent, demanding significant skill from the surgical team to ensure a positive postoperative outcome for the patient. The implementation of computer vision techniques such as reliable surgical instru ment detection and tracking can be leveraged for applications such as intraoperative decision support, surgical navigation assistance, and surgical skill assessment, which can significantly improve patient safety. The aim of this work is to implement a Multiple Object Tracking (MOT) benchmark model for the task of surgical instrument tracking in laparoscopic videos. To this end, a new dataset is introduced, m2cai16-tool-tracking, based on the m2cai16-tool locations dataset, specifically designed for surgical instrument tracking. This dataset includes both bounding box annotations for instrument detection and unique tracking ID annotations for multi-object tracking. This work employs ByteTrack, a state-of-the-art multiple-object tracking algorithm that follows the tracking-by-detection paradigm. ByteTrack predicts tool positions and associates object detections across frames, allowing consistent tracking of each instrument. The object detection step is performed using YOLOv4, a state-of-the-art object detection model known for real-time performance. YOLOv4 is first trained on the m2cai16-tool-locations dataset to establish a baseline performance and then on the custom m2cai16-tool-tracking dataset, al lowing to compare the detection performance of the custom dataset with an existing object detection dataset. YOLOv4 generates bounding box predictions for each frame in the laparo scopic videos. The bounding box detections serve as input for the ByteTrack algorithm, which assigns unique tracking IDs to each instrument to maintain their trajectories across frames. YOLOv4 achieves robust object detection performance on the m2cai16-tool-locations dataset, obtaining a mAP50 of 0.949, a mAP75 of 0.537, and a mAP50:95 of 0.526, with a real-time inference speed of 125 fps. However, detection performance on the m2cai16-tool tracking dataset is slightly lower, with a mAP50 of 0.839, mAP75 of 0.420, and mAP50:95 of 0.439, suggesting that differences in data partitioning impact detection accuracy. This lower detection accuracy for the tracking dataset likely affects the tracking performance of ByteTrack, reflected in a MOTP of 76.4, MOTA of 56.6, IDF1 score of 22.8, and HOTAscore of 23.0. Future work could focus on improving the object detection performance to enhance tracking quality. Additionally, including appearance-based features into the track ing step could improve association accuracy of detections across frames and help maintain consistent tracking even in challenging scenarios like occlusions. Such improvements could enhance tracking reliability to support surgical tasks better.
- Machine translation for suicide detection: validating spanish datasetsusing machine and deep learning models(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-11) Arenas Enciso, Francisco Ariel; Zareel, Mahdi; emipsanchez; García Ceja, Enrique Alejandro; Roshan Biswal, Rajesh; School of Engineering and Sciences; Sede EGADE MonterreySuicide is a complex health concern that affects not only individuals but society as a whole. The application of traditional strategies to prevent, assess, and treat this condition has proven inefficient in a modern world in which interactions are mainly made online. Thus, in recent years, multidisciplinary efforts have explored how computational techniques could be applied to automatically detect individuals who desire to end their lives on textual input. Such methodologies rely on two main technical approaches: text-based classification and deep learning. Further, these methods rely on datasets labeled with relevant information, often sourced from clinically-curated social media posts or healthcare records, and more recently, public social media data has proven especially valuable for this purpose. Nonetheless, research focused on the application of computational algorithms for detecting suicide or its ideation is still an emerging field of study. In particular, investigations on this topic have recently considered specific factors, like language or socio-cultural contexts, that affect the causality, rationality, and intentionality of an individual’s manifestation, to improve the assessment made on textual data. Consequently, problems like the lack of data in non-Anglo-Saxon contexts capable of exploiting computational techniques for detecting suicidal ideation are still a pending endeavor. Thus, this thesis addresses the limited availability of suicide ideation datasets in non-Anglo-Saxon contexts, particularly for Spanish, despite its global significance as a widely spoken language. The research hypothesizes that Machine- Translated Spanish datasets can yield comparable results (within a ±5% performance range) to English datasets when training machine learning and deep learning models for suicide ideation detection. To test this, multiple machine translation models were evaluated, and the two most optimal models were selected to translate an English dataset of social media posts into Spanish. The English and translated Spanish datasets were then processed through a binary classification task using SVM, Logistic Regression, CNN, and LSTM models. Results demonstrated that the translated Spanish datasets achieved scores in performance metrics close to the original English set across all classifiers, with limited variations in accuracy, precision, recall, F1-score, ROC AUC, and MCC metrics remaining within the hypothesized ±5% range. For example, the SVM classifier on the translated Spanish sets achieved an accuracy of 90%, closely matching the 91% achieved on the original English set. These findings confirm that machine-translated datasets can serve as effective resources for training ML and DL models for suicide ideation detection in Spanish, thereby supporting the viability of extending suicide detection models to non-English-speaking populations. This contribution provides a methodological foundation for expanding suicide prevention tools to diverse linguistic and cultural contexts, potentially benefiting health organizations and academic institutions interested in psychological computation.
- Smart camera FPGA hardware implementation for semantic segmentation of wildfire imagery(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-06-13) Garduño Martínez, Eduardo; Rodriguez Hernández, Gerardo; mtyahinojosa, emipsanchez; Gonzalez Mendoza, Miguel; Hinojosa Cervantes, Salvador Miguel; School of Engineering and Sciences; Campus Monterrey; Ochoa Ruiz, GilbertoIn the past few years, the more frequent occurrence of wildfires, which are a result of climate change, has devastated society and the environment. Researchers have explored various technologies to address this issue, including deep learning and computer vision solutions. These techniques have yielded promising results in semantic segmentation for detecting fire using visible and infrared images. However, implementing deep learning neural network models can be challenging, as it often requires energy-intensive hardware such as a GPU or a CPU with large cooling systems to achieve high image processing speeds, making it difficult to use in mobile applications such as drone surveillance. Therefore, to solve the portability problem, an FPGA hardware implementation is proposed to satisfy low power consumption requirements, achieve high accuracy, and enable fast image segmentation using convolutional neural network models for fire detection. This thesis employs a modified UNET model as the base model for fire segmentation. Subsequently, compression techniques reduce the number of operations performed by the model by removing filters from the convolutional layers and reducing the arithmetic precision of the CNN, decreasing inference time and storage requirements and allowing the Vitis AI framework to map the model architecture and parameters onto the FPGA. Finally, the model was evaluated using metrics utilized in prior studies to assess the performance of fire detection segmentation models. Additionally, two fire datasets are used to compare different data types for fire segmentation models, including visible images, a fusion of visible and infrared images generated by a GAN model, fine-tuning of the fusion GAN weights, and the use of visible and infrared images independently to evaluate the impact of visible-infrared information on segmentation performance.
- Neutrino classification through deep learning amid the Hyper-Kamiokande project development(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-06-10) Romo Fuentes, María Fernanda; Falcón Morales, Luis Eduardo; emipsanchez; Cuen Rochin, Saul; De la Fuente Acosta, Eduardo; School of Engineering and Sciences; Campus Estado de MéxicoNeutrinos are a type of elemental particle that are characterized by the fact that their mass is really small, that they have no electric charge and present a special behavior called oscillation in which they can be measured to be of a kind different to the one they actually are. All these characteristics make neutrinos one of the most studied particles by different researchers and in different facilities nowadays, since the information we can obtain from its study allows us to solve some of the Universe’s greatest mysteries. One of these projects where neutrinos are studied is the Hyper-Kamiokande which refers to both, the international collaboration of researchers, to which Mexico belongs to, and the neutrino grand-scale detector based on Cherenkov radiation currently being built in Japan. In this detector the data of a neutrino event is collected by a special kind of sensors located in its walls called Photo Multiplier Tubes or PMTs, to then be analyzed, and this analysis usually starts by the identification of the particles involved in an event, which is where this project comes forth, since an appropriate method to classify neutrinos based on the radiation pattern they leave as they pass through the detector is needed. Hence, in the following project to obtain the Master in Computer Science degree, we implement and test 4 deep learning architectures: VGG19, ResNet50, PointNet and Vision Transformer, for the classification of neutrinos since they are state of the art methods, this is, they are architectures used as the starting point for any classification task and, moreover, we can tune them and/or apply different techniques such as regularization to get the best possible performance while reducing overfitting. Using the mentioned architectures we process a dataset composed of neutrino events simulated by a software called WCSim in 2021. These events are of single ring type, correspond to the IWCD tank, a smaller tank being built to aid in the tasks of the Hyper-Kamiokande and range from 9 thousand to 8 million per each of the 3 particles considered in the project: muon and electron neutrinos and gamma particles. The results show that ResNet50 was the architecture that gave the best results while also minimizing the computational resources needed, though its performance is similar to the one given by VGG19 and PointNet, they require a larger time to process any dataset, whereas Vision Transformer provided the poorest results, however, all results improved by processing the largest datasets. Then, in comparison with a state of the art custom CNN we found that our highest average accuracy is within the same range as the one they obtained, whereas, in comparison with the ResNet50 model currently being used in the HK collaboration we found that the obtained AUC for the TPR signal (electron) vs FPR background (gamma) curve for our best model is 0.71, whereas this AUC value for the collaboration is 0.77, nonetheless, we have to consider that to obtain this value the whole results are not analyzed by the collaboration but cuts are applied and therefore, our results can be considered close.
- Deep learning applied to the detection of traffic signs in embedded devices(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-06) Rojas García, Javier; Fuentes Aguilar, Rita Quetziquel; emimmayorquin; Morales Vargas, Eduardo; Izquierdo Reyes, Javier; School of Engineering and Sciences; Campus Eugenio Garza SadaComputer vision is an integral component of autonomous vehicle systems, enabling tasks such as obstacle detection, road infrastructure recognition, and pedestrian identification. Autonomous agents must perceive their environment to make informed decisions and plan and control actuators to achieve predefined goals, such as navigating from point A to B without incidents. In recent years, there has been growing interest in developing Advanced Driving Assistance Systems like lane-keeping assistants, emergency braking mechanisms, and traffic sign detection systems. This growth is driven by advancements in Deep Learning techniques for image processing, enhanced hardware capabilities for edge computing, and the numerous benefits promised by autonomous vehicles. This work investigates the performance of three recent and popular object detectors from the YOLO series (YOLOv7, YOLOv8, and YOLOv9) on a custom dataset to identify the optimal architecture for TSD. The objective is to optimize and embed the best-performing model on the Jetson Orin AGX platform to achieve real-time performance. The custom dataset is derived from the Mapillary Traffic Sign Detection dataset, a large-scale, diverse, and publicly available resource. Detection of traffic signs that could potentially impact the longitudinal control of the vehicle is focused on. Results indicate that YOLOv7 offers the best balance between mean Average Precision and inference speed, with optimized versions running at over 55 frames per second on the embedded platform, surpassing by ample margin what is often considered real-time (30 FPS). Additionally, this work provides a working system for real-time traffic sign detection that could be used to alert unattentive drivers and contribute to reducing car accidents. Future work will explore further optimization techniques such as quantization-aware training, conduct more thorough real-life scenario testing, and investigate other architectures, including vision transformers and attention mechanisms, among other proposed improvements.
- Deep learning applied to the detection of traffic signs in embedded devices(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-06) Rojas García, Javier; Fuentes Aguilar, Rita Quetziquel; emimmayorquin; Morales Vargas, Eduardo; Izquierdo Reyes, Javier; School of Engineering and Sciences; Campus Eugenio Garza SadaComputer vision is an integral component of autonomous vehicle systems, enabling tasks such as obstacle detection, road infrastructure recognition, and pedestrian identification. Autonomous agents must perceive their environment to make informed decisions and plan and control actuators to achieve predefined goals, such as navigating from point A to B without incidents. In recent years, there has been growing interest in developing Advanced Driving Assistance Systems like lane-keeping assistants, emergency braking mechanisms, and traffic sign detection systems. This growth is driven by advancements in Deep Learning techniques for image processing, enhanced hardware capabilities for edge computing, and the numerous benefits promised by autonomous vehicles. This work investigates the performance of three recent and popular object detectors from the YOLO series (YOLOv7, YOLOv8, and YOLOv9) on a custom dataset to identify the optimal architecture for TSD. The objective is to optimize and embed the best-performing model on the Jetson Orin AGX platform to achieve real-time performance. The custom dataset is derived from the Mapillary Traffic Sign Detection dataset, a large-scale, diverse, and publicly available resource. Detection of traffic signs that could potentially impact the longitudinal control of the vehicle is focused on. Results indicate that YOLOv7 offers the best balance between mean Average Precision and inference speed, with optimized versions running at over 55 frames per second on the embedded platform, surpassing by ample margin what is often considered real-time (30 FPS). Additionally, this work provides a working system for real-time traffic sign detection that could be used to alert unattentive drivers and contribute to reducing car accidents. Future work will explore further optimization techniques such as quantization-aware training, conduct more thorough real-life scenario testing, and investigate other architectures, including vision transformers and attention mechanisms, among other proposed improvements.
- Harnessing machine learning for short-to-long range weather forecasting: a Monterrey case study(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-05) Machado Guillén, Gustavo de Jesús; Cruz Duarte, Jorge Mario; mtyahinojosa, emimmayorquin; Filus, Katarzyna; Falcón, Jesús Guillermo; Ibarra, Gerardo; Departamento de Ciencias Computacionales; Campus Monterrey; Conant, Santiago EnriqueWeather forecasting is crucial in adapting and integrating renewable energy sources, particularly in regions with complex climatic conditions like Monterrey. This study aims to provide reliable weather prediction methodologies by evaluating the performance of various traditional Machine Learning models, including Random Forest Regressor (RFR), Gradient Boosting Regressor (GBR), Support Vector Regressor (SVR), and Recurrent Neural Networks (RNN) such as SimpleRNN, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Cascade LSTM, Bidirectional RNNs, and a novel Convolutional LSTM/LSTM architecture that handles spatial and temporal data. The research employs a dataset of historical weather data from Automatic Weather Stations and Advanced Baseline Imager Level 2 GOES-16 products, including key weather features like air temperature, solar radiation, wind speed, relative humidity, and precipitation. The models were trained and evaluated across different predictive ranges by combining distinct sampling and model output sizes. This study’s findings underscore the effectiveness of the Cascade LSTM models, achieving a Mean Absolute Error of 1.6 °C for 72-hour air temperature predictions and 85.79 W/m2 for solar radiation forecasts. The ConvLSTM/LSTM model also significantly improves short-term predictions, particularly for solar radiation and humidity. The main contribution of this work is a comprehensive methodology that can be generalized to other regions and datasets, supporting the nationwide implementation of localized machine-learning forecasting models. This methodology includes steps for data collection, preprocessing, creation of lagged features, and model implementation, as well as applying distinct approaches to forecasting by using autoregressive and fixed window models. This framework enables the development of accurate, region-specific forecasting models, facilitating better weather prediction and planning nationwide.
- «
- 1 (current)
- 2
- 3
- »

