Ciencias Exactas y Ciencias de la Salud

Permanent URI for this collectionhttps://hdl.handle.net/11285/551039

Pertenecen a esta colección Tesis y Trabajos de grado de las Maestrías correspondientes a las Escuelas de Ingeniería y Ciencias así como a Medicina y Ciencias de la Salud.

Browse

Search Results

Now showing 1 - 10 of 12
  • Tesis de maestría / master thesis
    Exploring Anchor-Free Object Detection for Surgical Tool Detection in Laparoscopic Videos: A Comparative Study of CenterNet++ and Anchor-Based Models
    (Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-12) Aparicio Viveros, Carlos Alfredo; Ochoa Ruiz, Gilberto; emipsanchez; Hinojosa Cervantes, Salvador Miguel; Falcón Morales, Luis Eduardo; González Mendoza, Miguel; School of Engineering and Sciences; Campus Monterrey
    Minimally Invasive Surgery (MIS) has transformed modern medicine, offering reduced re covery times, minimal scarring, and lower risks of infection. However, MIS procedures alsopresent unique challenges, particularly in visualizing and manipulating surgical tools within a limited field of view. As a solution, this thesis investigates anchor-free deep learning mod els for real-time surgical tool detection in laparoscopic videos, proposing CenterNet++ as apotential improvement over traditional anchor-based methods. The hypothesis guiding thiswork is that anchor-free detectors, by avoiding predefined anchor boxes, can more effectively handle the diverse shapes, sizes, and positions of surgical tools. The primary objective of this thesis is to evaluate the performance of CenterNet++ in surgical tool detection compared to popular anchor-based models, specifically Faster R-CNN and YOLOv4, using the m2cai16-tool-locations dataset. CenterNet++ is examined in dif ferent configurations—including complete and real-time optimized (Fast-CenterNet++) ver sions—and tested against Faster R-CNN and YOLOv4 to assess trade-offs in accuracy and efficiency. Experimental results demonstrate that while CenterNet++ achieves high precision, particularly in scenarios requiring meticulous localization, its inference speed is significantly slower than YOLOv4, which attained real-time speeds at 128 FPS. CenterNet++’s unique keypoint refinement mechanism, though beneficial for localization, impacts its computational efficiency, highlighting areas for further optimization. To bridge this gap, several architectural improvements are proposed based on YOLOv4’s streamlined design. These include integrating modules like Spatial Pyramid Pooling (SPP) and Path Aggregation Network (PANet), along with reducing input resolution in the Fast CenterNet++ configuration. Additionally, future work is suggested to explore CenterNet++ in larger, more complex datasets and to develop semi-supervised learning approaches that could mitigate the limitations of annotated surgical datasets. In conclusion, this thesis contributes a comprehensive evaluation of anchor-free models for surgical tool detection, providing a foundation for further advancements in real-time, high precision object detection for surgical assistance. The findings underscore the potential of anchor-free models, such as CenterNet++, to meet the evolving demands of MIS with targeted architectural adaptations.
  • Tesis de maestría / master thesis
    Smart camera FPGA hardware implementation for semantic segmentation of wildfire imagery
    (Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-06-13) Garduño Martínez, Eduardo; Rodriguez Hernández, Gerardo; mtyahinojosa, emipsanchez; Gonzalez Mendoza, Miguel; Hinojosa Cervantes, Salvador Miguel; School of Engineering and Sciences; Campus Monterrey; Ochoa Ruiz, Gilberto
    In the past few years, the more frequent occurrence of wildfires, which are a result of climate change, has devastated society and the environment. Researchers have explored various technologies to address this issue, including deep learning and computer vision solutions. These techniques have yielded promising results in semantic segmentation for detecting fire using visible and infrared images. However, implementing deep learning neural network models can be challenging, as it often requires energy-intensive hardware such as a GPU or a CPU with large cooling systems to achieve high image processing speeds, making it difficult to use in mobile applications such as drone surveillance. Therefore, to solve the portability problem, an FPGA hardware implementation is proposed to satisfy low power consumption requirements, achieve high accuracy, and enable fast image segmentation using convolutional neural network models for fire detection. This thesis employs a modified UNET model as the base model for fire segmentation. Subsequently, compression techniques reduce the number of operations performed by the model by removing filters from the convolutional layers and reducing the arithmetic precision of the CNN, decreasing inference time and storage requirements and allowing the Vitis AI framework to map the model architecture and parameters onto the FPGA. Finally, the model was evaluated using metrics utilized in prior studies to assess the performance of fire detection segmentation models. Additionally, two fire datasets are used to compare different data types for fire segmentation models, including visible images, a fusion of visible and infrared images generated by a GAN model, fine-tuning of the fusion GAN weights, and the use of visible and infrared images independently to evaluate the impact of visible-infrared information on segmentation performance.
  • Tesis de maestría / master thesis
    Deep learning applied to the detection of traffic signs in embedded devices
    (Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-06) Rojas García, Javier; Fuentes Aguilar, Rita Quetziquel; emimmayorquin; Morales Vargas, Eduardo; Izquierdo Reyes, Javier; School of Engineering and Sciences; Campus Eugenio Garza Sada
    Computer vision is an integral component of autonomous vehicle systems, enabling tasks such as obstacle detection, road infrastructure recognition, and pedestrian identification. Autonomous agents must perceive their environment to make informed decisions and plan and control actuators to achieve predefined goals, such as navigating from point A to B without incidents. In recent years, there has been growing interest in developing Advanced Driving Assistance Systems like lane-keeping assistants, emergency braking mechanisms, and traffic sign detection systems. This growth is driven by advancements in Deep Learning techniques for image processing, enhanced hardware capabilities for edge computing, and the numerous benefits promised by autonomous vehicles. This work investigates the performance of three recent and popular object detectors from the YOLO series (YOLOv7, YOLOv8, and YOLOv9) on a custom dataset to identify the optimal architecture for TSD. The objective is to optimize and embed the best-performing model on the Jetson Orin AGX platform to achieve real-time performance. The custom dataset is derived from the Mapillary Traffic Sign Detection dataset, a large-scale, diverse, and publicly available resource. Detection of traffic signs that could potentially impact the longitudinal control of the vehicle is focused on. Results indicate that YOLOv7 offers the best balance between mean Average Precision and inference speed, with optimized versions running at over 55 frames per second on the embedded platform, surpassing by ample margin what is often considered real-time (30 FPS). Additionally, this work provides a working system for real-time traffic sign detection that could be used to alert unattentive drivers and contribute to reducing car accidents. Future work will explore further optimization techniques such as quantization-aware training, conduct more thorough real-life scenario testing, and investigate other architectures, including vision transformers and attention mechanisms, among other proposed improvements.
  • Tesis de maestría / master thesis
    Deep learning applied to the detection of traffic signs in embedded devices
    (Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-06) Rojas García, Javier; Fuentes Aguilar, Rita Quetziquel; emimmayorquin; Morales Vargas, Eduardo; Izquierdo Reyes, Javier; School of Engineering and Sciences; Campus Eugenio Garza Sada
    Computer vision is an integral component of autonomous vehicle systems, enabling tasks such as obstacle detection, road infrastructure recognition, and pedestrian identification. Autonomous agents must perceive their environment to make informed decisions and plan and control actuators to achieve predefined goals, such as navigating from point A to B without incidents. In recent years, there has been growing interest in developing Advanced Driving Assistance Systems like lane-keeping assistants, emergency braking mechanisms, and traffic sign detection systems. This growth is driven by advancements in Deep Learning techniques for image processing, enhanced hardware capabilities for edge computing, and the numerous benefits promised by autonomous vehicles. This work investigates the performance of three recent and popular object detectors from the YOLO series (YOLOv7, YOLOv8, and YOLOv9) on a custom dataset to identify the optimal architecture for TSD. The objective is to optimize and embed the best-performing model on the Jetson Orin AGX platform to achieve real-time performance. The custom dataset is derived from the Mapillary Traffic Sign Detection dataset, a large-scale, diverse, and publicly available resource. Detection of traffic signs that could potentially impact the longitudinal control of the vehicle is focused on. Results indicate that YOLOv7 offers the best balance between mean Average Precision and inference speed, with optimized versions running at over 55 frames per second on the embedded platform, surpassing by ample margin what is often considered real-time (30 FPS). Additionally, this work provides a working system for real-time traffic sign detection that could be used to alert unattentive drivers and contribute to reducing car accidents. Future work will explore further optimization techniques such as quantization-aware training, conduct more thorough real-life scenario testing, and investigate other architectures, including vision transformers and attention mechanisms, among other proposed improvements.
  • Tesis de maestría / master thesis
    Automated U-Net Hippocampal Segmentation and Volumetric Classification for Major Depressive Disorder Stage Differentiation
    (Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-06) Salazar Zozaya, Andrea del Carmen; Cantoral Ceballos, José Antonio; emimmayorquin; Castañeda Miranda, Alejandro; Trejo Rodriguez, Luis Angél; School of Engineering and Sciences; Campus Monterrey; Caraza Camacho, Ricardo
    Major Depressive Disorder (MDD) is the leading cause of disability in the world, affecting approximately 280 million people. Hippocampal volumetric changes have been proposed as a potential biomarker for depression. Despite advancements in neuroimaging studies related to psychiatric disorders, there remains a gap in the utilization of neuroimaging techniqes for clinical diagnosis and monitoring of such disorders. This study presents a comprehensive investigation of Major Depressive Disorder (MDD) stage differentiation using MRI data and a U-Net architecture for hippocampal segmentation across axial, coronal, and sagittal orientations. This approach presents a U-Net architecture to segment 2D slices of the hippocampus from MRI data in axial, sagittal, and coronal orientations. These segmented 2D slices are then concatenated to create a hippocampus volume representation, which is then used to obtain the hippocampal volume. These volumetric values are subsequently used to differentiate among the four stages of MDD according to the Beck Depression Inventory-II (minimal/no MDD, mild, moderate, and severe) using a Neural Network and three machine learning classifiers. The experimental results demostrated that the U-Net model effectively segments the hippocampus, while the volumetric analysis accurately differentiates MDD stages. The Gradient Boost classifier outperformed the other classifiers with an 98.5% accuracy, 99.90% precision, 98.85% recall, and an F1-score of 98.6%. This study advances the field of neuroimaging and mental disorder assessment by introducing a reliable and automated method for hippocampus segmentation and MDD stage categorization. Future directions include incorporating more brain regions such as the amygdala and habenula, creating a neural networks classifier based on 3D hippocampus images, and using larger, more diverse datasets to increase model performance and generalizability.
  • Tesis de maestría / master thesis
    3-D Detection & tracking for semi-deformable objects
    (Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024) De Los Rios Alatorre, Gustavo; Muñoz Ubando, Luis Alberto; emimmayorquin; Hernández Gress, Neil; Ceballos Cancino, Héctor Gibrán; Raygosa Barahona, Rubén Renan; Maestro en Ciencias de la Computación; Campus Monterrey
    This thesis introduces a computer vision system designed for real-time detection and pose estimation of semi-deformable objects in 3-D space, leveraging edge computing devices. The primary motivation for this research stems from the need to enhance the capabilities of vision-based systems, which in turn can aid in improving the efficiency and effectiveness of robotics systems in a variety of fields. For the context of the thesis the chosen field was agriculture, focusing on the recognition, tracking and pose estimation of bell peppers by harvesting robots, an application where traditional methods often fall short due to the nature of semi-deformable objects like fruits. A Jetson Nano was used as the main component, while an Intel DE10-Nano was considered as a complementary part of the system for performing image preprocessing tasks with the Azure Kinect being considered as the main camera sensor. The algorithm was successfully deployed in the Jetson Nano, successfully tracking and estimating the pose of a bell pepper in 3-D by performing the necessary rotations and deformations to a canonical model used by the system as a general means to estimate the pose of the pepper in the real world scene. The algorithm was also tested in a ROS 2 Gazebo simulation where an x-arm robot was used to simulate the vision part of a pick and place operation with a simulated bell pepper, using the proposed method to accurately identify and estimate the pose of the pepper in the simulation. Lastly, a set of different segmentation techniques using both deep learning and traditional methods are presented as a means to explore how these could better the current segmentation capacity of the system.
  • Tesis de maestría / master thesis
    Deep Learning Approach for Alzheimer’s Disease Classification: Integrating Multimodal MRI and FDG- PET Imaging Through Dual Feature Extractors and Shared Neural Network Processing
    (Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024) Vega Guzmán, Sergio Eduardo; Alfaro Ponce, Mariel; emimmayorquin; Ochoa Ruíz, Gilberto; Chairez Oria, Jorge Isaac; Hernandez Sanchez, Alejandra; School of Engineering and Sciences; Campus Monterrey; Ramírez Nava, Gerardo Julián
    Alzheimer’s disease (AD) is a progressive neurodegenerative disorder whose incidence is expected to grow in the coming years. Traditional diagnostic methods, such as MRI and FDG-PET, each provide valuable but limited insights into the disease’s pathology. This thesis researches the potential of a multimodal deep learning classifier to improve the diagnostic accuracy of AD by integrating MRI and FDG-PET imaging data in comparison to single modality implementations. The study proposes a lightweight neural architecture that uses the strengths of both imaging modalities, aiming to reduce computational costs while maintaining state-of-the-art diagnostic performance. The proposed model utilizes two pre-trained feature extractors, one for each imaging modality, fine-tuned to capture the relevant features from the dataset. The outputs of these extractors are fused into a single vector to form an enriched feature map that better describes the brain. Experimental results demonstrate that the multimodal classifier outperforms single modality classifiers, achieving an overall accuracy of 90% on the test dataset. The VGG19 model was the best feature extractor for both MRI and PET data since it showed superior performance when compared to the other experimental models, with an accuracy of 71.9% for MRI and 80.3% for PET images. The multimodal implementation also exhibited higher precision, recall, and F1 scores than the single-modality implementations. For instance, it achieved a precision of 0.90, recall of 0.94, and F1-score of 0.92 for the AD class and a precision of 0.89, recall of 0.82, and F1-score of 0.86 for the CN class. Furthermore, explainable AI techniques provided insights into the model’s decisionmaking process, revealing that it effectively utilizes both structural and metabolic information to distinguish between AD and cognitively normal (CN) subjects. This research adds supporting evidence into the potential of multimodal imaging and machine learning to enhance early detection and diagnosis of Alzheimer’s disease, offering a cost-effective solution suitable for widespread clinical applications.
  • Tesis de maestría / master thesis
    Automatic detection and segmentation of prostate cancer using deep learning techniques
    (Instituto Tecnológico y de Estudios Superiores de Monterrey, 2023-05-20) Quihui Rubio, Pablo César; González Mendoza, Miguel; puemcuervo, emimayorquin; Alfaro Ponce, Mariel; Mata Miquel, Christian; Hinojosa Cervantes, Salvador Miguel; School of Engineering and Sciences; Campus Monterrey; Ochoa Ruiz, Gilberto
    Prostate cancer is a major cause of death among men worldwide, and detecting it usually involves invasive procedures. Magnetic resonance imaging (MRI) has become a common research area for detecting this cancer because it represents a less invasive option. However, segmenting the prostate gland from MRI images can be a complicated task that requires expert opinions, which is both time-consuming and inconsistent. This thesis proposes a novel deep-learning architecture to automate and obtain accurate and reliable segmentation of the prostate gland in MRI scans. Precise segmentation is crucial for radiotherapy planning, as it determines the tumor’s location and size, which affects treat- ment effectiveness and reduces radiation exposure to surrounding healthy tissues. Therefore, a thorough comparison between architectures from the state-of-the-art is also performed. Convolutional neural networks have shown great potential in medical image segmenta- tion, but the uncertainty associated with their predictions is often overlooked. Therefore, this work proposes a novel approach incorporating uncertainty quantification to ensure reliable and trustworthy results. The models were evaluated on a dataset of prostate T2-MRI scans obtained in collab- oration with the Centre Hospitalarie Dijon and Universitat Politecnica de Catalunya. The results showed that the proposed architecture FAU-Net outperforms most existing models in the literature, with an improvement of 5% in the Dice Similarity Coefficient (DSC) and In- tersection over Union (IoU). However, the best model overall was R2U-Net, which achieved segmentation accuracy and uncertainty estimation values of 85% and 76% for DSC and IoU, respectively, with an uncertainty score lower than 0.05. In addition to the proposed model and comparison between models for prostate seg- mentation and uncertainty quantification, a web application was presented for easier access to the trained models in a clinical setting. This web app would allow medical professionals to upload MRI scans of prostate cancer patients and obtain accurate and reliable segmentation quickly and easily. This would reduce the time and effort required for manual segmentation and improve patient outcomes by facilitating better treatment planning. Overall, this work presents a novel strategy for prostate segmentation using deep learn- ing models and uncertainty quantification. The proposed method provides a reliable and trust- worthy segmentation while quantifying the uncertainty associated with the predictions. This research can benefit prostate cancer patients by improving treatment planning and outcomes.
  • Tesis de maestría
    Contextual information for Person Re-Identification on outdoor environements.
    (Instituto Tecnológico y de Estudios Superiores de Monterrey, 2021-06) Garnica López, Luis Alberto; Chang Fernández, Leonardo; 345979; Chang Fernández, Leonardo; emipsanchez; Pérez Suárez, Airel; Gutiérrez Rodríguez, Andrés Eduardo; School of Engineering and Sciences; Campus Monterrey; González Mendoza, Miguel
    Person Re-Identification (ReID) is obtaining good results and is getting closer and closer to being ready for implementation in production scenarios. However, there are still improvements to be performed, and usually the performance of this task is affected by illumination or natural elements that could distort their images such as fog or dust, when the task is implemented in outdoor environments. In this work, we introduce a novel proposal for the inclusion of contextual information in a ReID re-ranking approach, to help to improve the effectiveness of this task in surveillance systems. Most of the previous research in this field usually make use only of the visual data contained in the images processed by ReID. Even the approaches that make use of some sort of context, is normally annotated context within the scope of the image itself, or the exploration of the relationships between the different images where the Id’s are found. We understand that there is a lot of contextual information available in these scenarios that are not being included and that might help to reduce the impact of these situations on the performance of the task. In the present document, we perform a complete analysis of the effect of the inclusion of this contextual information with the normally produced embeddings generated by several ReID models, processing it through an architecture inspired in siamese neural networks, but with triplet loss. The neural network was trained using a novel dataset developed specifically for this task, which is annotated including this extra information. The dataset is composed of 34156 images from 3 different cameras of 501 labeled identities. Along with this data, each image includes 12 extra features with its specific contextual information. This dataset of images was processed previously using three different ReID models to ensure that the results obtained when the information is included, are independent of the ReID approach taken as the base, which are: Triplet Network (TriNet), Multiple Granularity Network (MGN), and Multi-Level Factorization Net (MLFN). Each one produced 2048-dimensional embeddings. All of our proposed experiments achieved an improvement with respect to the original mAP generated from these three networks. Going from 86.53 to 94.9, from 84.94 to 93.11, and from 95.35 to 95.93 respectively for our dataset.
  • Tesis de maestría
    TYolov5: A Temporal Yolov5 detector based on quasi-recurrent neural networks for real-time handgun detection in video
    (Instituto Tecnológico y de Estudios Superiores de Monterrey, 2020-12-01) Duran Vega, Mario Alberto; GONZALEZ MENDOZA, MIGUEL; 123361; González Mendoza, Miguel; puemcuervo; Ochoa Ruiz, Gilberto; Morales González Quevedo, Annette; Sánchez Castellanos, Héctor Manuel; School of Engineering and Science; Campus Monterrey; Chang Fernández, Leonardo
    Timely handgun detection is a crucial problem to improve public safety; nevertheless, the effectiveness of many surveillance systems, still depend of finite human attention. Much of the previous research on handgun detection is based on static image detectors, leaving aside valuable temporal information that could be used to improve object detection in videos. To improve the performance of surveillance systems, a real-time temporal handgun detection system should be built. Using Temporal Yolov5, an architecture based in Quasi-Recurrent Neural Networks, temporal information is extracted from video to improve the results of the handgun detection. Moreover, two publicity available datasets are proposed, labeled with hands, guns, and phones. One containing 2199 static images to train static detectors, and another with 5960 frames of videos to train temporal modules. Additionally, we explore two temporal data augmentation techniques based in Mosaic and Mixup. The resulting systems are three real-time architectures: one focused in reducing inference with a mAP(50:95) of 56.1, another in having a good balance between inference and accuracy with a mAP(50:95) of 59.4, and a last one specialized in accuracy with a mAP(50:95) of 60.6. Temporal Yolov5 achieves real-time detection and take advantage of temporal features contained in videos to perform better than Yolov5 in our temporal dataset. Making TYolov5 suitable for real-world applications.
En caso de no especificar algo distinto, estos materiales son compartidos bajo los siguientes términos: Atribución-No comercial-No derivadas CC BY-NC-ND http://www.creativecommons.mx/#licencias
logo

El usuario tiene la obligación de utilizar los servicios y contenidos proporcionados por la Universidad, en particular, los impresos y recursos electrónicos, de conformidad con la legislación vigente y los principios de buena fe y en general usos aceptados, sin contravenir con su realización el orden público, especialmente, en el caso en que, para el adecuado desempeño de su actividad, necesita reproducir, distribuir, comunicar y/o poner a disposición, fragmentos de obras impresas o susceptibles de estar en formato analógico o digital, ya sea en soporte papel o electrónico. Ley 23/2006, de 7 de julio, por la que se modifica el texto revisado de la Ley de Propiedad Intelectual, aprobado

DSpace software copyright © 2002-2026

Licencia