Ciencias Exactas y Ciencias de la Salud
Permanent URI for this collectionhttps://hdl.handle.net/11285/551039
Pertenecen a esta colección Tesis y Trabajos de grado de las Maestrías correspondientes a las Escuelas de Ingeniería y Ciencias así como a Medicina y Ciencias de la Salud.
Browse
Search Results
- A prompt assisted image enhancement model using BERT classifier and modified LMSPEC and STTN techniques for endoscopic images(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-12) Cerriteño Magaña, Javier; Ochoa Ruiz, Gilberto; emipsanchez; Sánchez Ante, Gildardo; Alfaro Ponce, Mariel; School of Engineering and Sciences; Campus MonterreyThis document presents a research thesis for the Master in Computer Science (MCCi) degree at Tecnologico de Monterrey. The field of medical imaging, particularly in endoscopy, has seen significant advancements in image enhancement techniques aimed at improving the clarity and interpretability of captured images. Numerous models and methodologies have been developed to enhance medical images, ranging from traditional algorithms to complex deep learning frameworks. However, the effective implementation of these techniques often requires substantial expertise in computer science and image processing, which may pose a barrier for medical professionals who primarily focus on clinical practice. This thesis presents a novel prompt-assisted image enhancement model that integrates the LMSPEC and STTN techniques, augmented by BERT models equipped with added attention blocks. This innovative approach enables medical practitioners to specify desired image enhancements through natural language prompts, significantly simplifying the enhancement process. By interpreting and acting upon user-defined requests, the proposed model not only empowers clinicians with limited technical backgrounds to effectively enhance endoscopic images but also streamlines diagnostic workflows. To the best of our knowledge, this is the first dedicated prompt-assisted image enhancement model specifically tailored for medical imaging applications. Moreover, the architecture of the proposed model is designed with flexibility in mind, allowing for the seamless incorporation of future image enhancement models and techniques as they emerge. This adaptability ensures that the model remains relevant and effective as the field of medical imaging continues to evolve. The results of this research contribute to the ongoing effort to make advanced image processing technologies more accessible to medical professionals, thereby enhancing the quality of care provided to patients through improved diagnostic capabilities.
- Object detection-based surgical instrument tracking in laparoscopy videos(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-12) Guerrero Ramírez, Cuauhtemoc Alonso; Ochoa Ruiz, Gilberto; emipsanchez; González Mendoza, Miguel; Hinojosa Cervantes, Salvador Miguel; Falcón Morales, Luis Eduardo; School of Engineering and Sciences; Campus Monterrey; Medina Pérez, Miguel ÁngelMinimally invasive surgery (MIS) has transformed surgery by offering numerous advantages over traditional open surgery, such as reduced pain, minimized trauma, and faster recovery times. However, endoscopic MIS procedures remain highly operator-dependent, demanding significant skill from the surgical team to ensure a positive postoperative outcome for the patient. The implementation of computer vision techniques such as reliable surgical instru ment detection and tracking can be leveraged for applications such as intraoperative decision support, surgical navigation assistance, and surgical skill assessment, which can significantly improve patient safety. The aim of this work is to implement a Multiple Object Tracking (MOT) benchmark model for the task of surgical instrument tracking in laparoscopic videos. To this end, a new dataset is introduced, m2cai16-tool-tracking, based on the m2cai16-tool locations dataset, specifically designed for surgical instrument tracking. This dataset includes both bounding box annotations for instrument detection and unique tracking ID annotations for multi-object tracking. This work employs ByteTrack, a state-of-the-art multiple-object tracking algorithm that follows the tracking-by-detection paradigm. ByteTrack predicts tool positions and associates object detections across frames, allowing consistent tracking of each instrument. The object detection step is performed using YOLOv4, a state-of-the-art object detection model known for real-time performance. YOLOv4 is first trained on the m2cai16-tool-locations dataset to establish a baseline performance and then on the custom m2cai16-tool-tracking dataset, al lowing to compare the detection performance of the custom dataset with an existing object detection dataset. YOLOv4 generates bounding box predictions for each frame in the laparo scopic videos. The bounding box detections serve as input for the ByteTrack algorithm, which assigns unique tracking IDs to each instrument to maintain their trajectories across frames. YOLOv4 achieves robust object detection performance on the m2cai16-tool-locations dataset, obtaining a mAP50 of 0.949, a mAP75 of 0.537, and a mAP50:95 of 0.526, with a real-time inference speed of 125 fps. However, detection performance on the m2cai16-tool tracking dataset is slightly lower, with a mAP50 of 0.839, mAP75 of 0.420, and mAP50:95 of 0.439, suggesting that differences in data partitioning impact detection accuracy. This lower detection accuracy for the tracking dataset likely affects the tracking performance of ByteTrack, reflected in a MOTP of 76.4, MOTA of 56.6, IDF1 score of 22.8, and HOTAscore of 23.0. Future work could focus on improving the object detection performance to enhance tracking quality. Additionally, including appearance-based features into the track ing step could improve association accuracy of detections across frames and help maintain consistent tracking even in challenging scenarios like occlusions. Such improvements could enhance tracking reliability to support surgical tasks better.
- Exploring Anchor-Free Object Detection for Surgical Tool Detection in Laparoscopic Videos: A Comparative Study of CenterNet++ and Anchor-Based Models(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-12) Aparicio Viveros, Carlos Alfredo; Ochoa Ruiz, Gilberto; emipsanchez; Hinojosa Cervantes, Salvador Miguel; Falcón Morales, Luis Eduardo; González Mendoza, Miguel; School of Engineering and Sciences; Campus MonterreyMinimally Invasive Surgery (MIS) has transformed modern medicine, offering reduced re covery times, minimal scarring, and lower risks of infection. However, MIS procedures alsopresent unique challenges, particularly in visualizing and manipulating surgical tools within a limited field of view. As a solution, this thesis investigates anchor-free deep learning mod els for real-time surgical tool detection in laparoscopic videos, proposing CenterNet++ as apotential improvement over traditional anchor-based methods. The hypothesis guiding thiswork is that anchor-free detectors, by avoiding predefined anchor boxes, can more effectively handle the diverse shapes, sizes, and positions of surgical tools. The primary objective of this thesis is to evaluate the performance of CenterNet++ in surgical tool detection compared to popular anchor-based models, specifically Faster R-CNN and YOLOv4, using the m2cai16-tool-locations dataset. CenterNet++ is examined in dif ferent configurations—including complete and real-time optimized (Fast-CenterNet++) ver sions—and tested against Faster R-CNN and YOLOv4 to assess trade-offs in accuracy and efficiency. Experimental results demonstrate that while CenterNet++ achieves high precision, particularly in scenarios requiring meticulous localization, its inference speed is significantly slower than YOLOv4, which attained real-time speeds at 128 FPS. CenterNet++’s unique keypoint refinement mechanism, though beneficial for localization, impacts its computational efficiency, highlighting areas for further optimization. To bridge this gap, several architectural improvements are proposed based on YOLOv4’s streamlined design. These include integrating modules like Spatial Pyramid Pooling (SPP) and Path Aggregation Network (PANet), along with reducing input resolution in the Fast CenterNet++ configuration. Additionally, future work is suggested to explore CenterNet++ in larger, more complex datasets and to develop semi-supervised learning approaches that could mitigate the limitations of annotated surgical datasets. In conclusion, this thesis contributes a comprehensive evaluation of anchor-free models for surgical tool detection, providing a foundation for further advancements in real-time, high precision object detection for surgical assistance. The findings underscore the potential of anchor-free models, such as CenterNet++, to meet the evolving demands of MIS with targeted architectural adaptations.
- A prompt assisted image enhancement model using BERT classifier and modified LMSPEC and STTN techniques for endoscopic images(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-12) Cerriteño Magaña, Javier; Ochoa Ruiz, Gilberto; emimmayorquin; Alfaro Ponce, Mariel; School of Engineering and Sciences; Campus Monterrey; Sánchez Ante, GildardoThis document presents a research thesis for the Master in Computer Science (MCCi) degree at Tecnologico de Monterrey. The field of medical imaging, particularly in endoscopy, has seen significant advancements in image enhancement techniques aimed at improving the clarity and interpretability of captured images. Numerous models and methodologies have been developed to enhance medical images, ranging from traditional algorithms to complex deep learning frameworks. However, the effective implementation of these techniques often requires substantial expertise in computer science and image processing, which may pose a barrier for medical professionals who primarily focus on clinical practice. This thesis presents a novel prompt-assisted image enhancement model that integrates the LMSPEC and STTN techniques, augmented by BERT models equipped with added attention blocks. This innovative approach enables medical practitioners to specify desired image enhancements through natural language prompts, significantly simplifying the enhancement process. By interpreting and acting upon user-defined requests, the proposed model not only empowers clinicians with limited technical backgrounds to effectively enhance endoscopic images but also streamlines diagnostic workflows. To the best of our knowledge, this is the first dedicated prompt-assisted image enhancement model specifically tailored for medical imaging applications. Moreover, the architecture of the proposed model is designed with flexibility in mind, allowing for the seamless incorporation of future image enhancement models and techniques as they emerge. This adaptability ensures that the model remains relevant and effective as the field of medical imaging continues to evolve. The results of this research contribute to the ongoing effort to make advanced image processing technologies more accessible to medical professionals, thereby enhancing the quality of care provided to patients through improved diagnostic capabilities.
- Detection and classification of gastrointestinal diseases using deep learning techniques(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2022-11-30) Chavarrias Solano, Pedro Esteban; OCHOA RUIZ, GILBERTO; 3016604; Ochoa Ruiz, Gilberto; puemcuervo, emipsanchez; Sanchez Ante, Gildardo; Hinojosa Cervantes, Salvador Miguel; School of Engineering and Sciences; Campus Monterrey; Ali, SharibThis document presents a research thesis for the Master in Computer Science (MCCi) degree at Tecnologico de Monterrey. Cancer is a pathological situation in which old or abnormal cells do not die when they should. Even though there are different cancer types, the incidence of colorectal cancer position it as the third most common one worldwide. Endoscopy is the primary diagnostic tool used to manage gastrointestinal (GI) tract malignancies, however, it is a time consuming and subjective process based on the experience of the clinician. Previous work has been done leveraging the use of artificial intelligence methods for polyps detection, instrument tracking and segmentation of gastric ulcers. This work is focused on the detec- tion and classification of gastrointestinal diseases. This thesis proposal seeks to implement a knowledge distillation framework with class-aware loss for endoscopic disease detection in the upper and lower part of the gastrointestinal tract. Relevant features will be extracted from endoscopic images to feed and train a deep learning-based object detection model. The method is evaluated using standard computer vision metrics: IoU and mAP25, mAP50, mAP75, mAP25:75. This proposal outperforms state-of-the-art methods and its vanilla version, which means that it has the potential to be an auxiliary quantitative tool to reduce high-missed de- tection rates in endoscopic procedures.
- Novel metric-learning methods for generalizable and discriminative few-shot image classification(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2021-12-09) Méndez Ruiz, Mauricio; OCHOA RUIZ, GILBERTO; 352103; Ochoa Ruiz, Gilberto; puelquio/mscuervo; Chang Fernández, Leonardo; Méndez Vázquez, Andrés; School of Engineering and Sciences; Campus MonterreyFew-shot learning (FSL) is a challenging and relatively new technique that specializes in problems where we have little amount of data. The goal of these methods is to classify categories that have not been seen before with just a handful of labeled samples. Recent works based on metric-learning approaches benefit from the meta-learning process in which we have episodic tasks conformed by a support set (training) and a query set (test), and the objective is to learn a similarity comparison metric between those sets. Metric learning methods have demonstrated that simple models can achieve good performance. However, the feature space learned by a given metric learning approach may not exploit the information given by a specific few-shot task. Due to the lack of data, the learning process of the embedding network becomes an important part for these models to take better advantage of the similarity metric on a few-shot task. The contributions of the present thesis are three-fold. First, we explore the use of dimension reduction techniques as a way to find significant features in the few-shot task, which allows a better classification. We measure the performance of the reduced features by assigning a score based on the intra-class and inter-class distance, and select the best feature reduction method in which instances of different classes are far away and instances of the same class are close. This method outperforms the metric learning baselines in the miniImageNet dataset by around 2% in accuracy performance. Further on, we propose two different distance-based loss functions for few-shot classification. One is inspired on the triplet-loss function while the other evaluates the embedding vectors from a task using the concepts of intra-class and inter-class distance among the few samples. Extensive experimental results on the miniImagenNet dataset show an increase on the accuracy performance compared with other metric-based FSL methods by a margin of 2%. Lastly, we evaluate the generalization ca- pabilities of meta-learning based FSL on two real-life medical datasets with small availability of data. It has been repeatedly showed that deep learning (DL) methods trained on a dataset don’t generalize well to datasets from other domains or even to similar datasets, due to the data distribution shifts. We propose the use of a meta-learning based FSL approach to alleviate these problems by demonstrating, using two datasets of kidney stones samples acquired with different endoscopes and different acquisition conditions, that such methods are indeed capable of handling domain shifts. Where deep learning based methods fail to generalize to instances of the same class but from different data distributions, we prove that FSL is capable of generalizing without a large decrease on performance. This method is capable of doing remarkably well even under the very limited data conditions, attaining an accuracy of 74.38% and 88.52% in the 5-way 5-shot and 5-way 20-shot settings respectively, while traditional DL methods attained an accuracy of 45% in the same data.
- Characterization of jet fire flame temperature zones using a deep learning-based segmentation approach(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2021-12-02) Pérez Guerrero, Carmina; OCHOA RUIZ, GILBERTO; 352103; Ochoa Ruiz, Gilberto; puemcuervo; González Mendoza, Miguel; Mata Miquel, Christian; School of Engineering and Sciences; Campus Monterrey; Palacios Rosas, AdrianaJet fires are relatively small and have the least severe effects among the diverse fire accidents that can occur in industrial plants; however, they are usually involved in a process known as domino effect, that leads to more severe events, such as explosions or the initiation of another fire, making the analysis of such fires an important part of risk analysis. One such analysis would be the segmentation of different radiation zones within the flame, therefore this thesis presents an exploratory research regarding several traditional computer vision and Deep Learning segmentation approaches to solve this specific problem. A data set of propane jet fires is used to train and evaluate the different approaches. Different metrics are correlated to a manual ranking performed by experts to make an evaluation that closely resembles the expert’s criteria. Additionally, given the difference in the distribution of the zones and background of the images, different loss functions, that seek to alleviate data imbalance, are explored. The Hausdorff Distance and Adjusted Rand Index were the metrics with the highest correlation and the best results were obtained from training with a Weighted Cross-Entropy Loss. The best performing models were found to be the UNet architecture, along with its recent variations, Attention UNet and UNet++. These models are then used to segment a group of vertical jet flames of varying pipe outlet diameters to extract their main geometrical characteristics. Attention UNet obtained the best general performance in the approximation of both height and area of the flames, while also showing a statistically significant difference between it and UNet++. UNet obtained the best overall performance for the approximation of the lift-off distances; however, there is not enough data to prove a statistically significant difference between UNet and its two variations. The only instance where UNet++ outperformed the other models, was while obtaining the lift-off distances of the jet flames with 0.01275 m pipe outlet diameter. In general, the explored models show good agreement between the experimental and predicted values for relatively large turbulent propane jet flames, released in sonic and subsonic regimes; thus, making these radiation zones segmentation models, a suitable approach for different jet flame risk management scenarios.