Ciencias Exactas y Ciencias de la Salud
Permanent URI for this collectionhttps://hdl.handle.net/11285/551039
Pertenecen a esta colección Tesis y Trabajos de grado de las Maestrías correspondientes a las Escuelas de Ingeniería y Ciencias así como a Medicina y Ciencias de la Salud.
Browse
Search Results
- An explainable AI-based system for kidney stone classification using color and texture descriptors(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2025-06) De Anda García, Ilse Karena; Ochoa Ruiz, Gilberto; emipsanchez; González Mendoza, Miguel; School of Engineering and Sciences; Campus Monterrey; Hinojosa Cervantes, Salvador MiguelKidney stone disease affects nearly 10% of the global population and remains a significant clinical and economic burden. Accurate classification of stone subtypes is essential for guiding treatment decisions and preventing recurrence. This thesis presents the design, implementation, and evaluation of an explainable artificial intelligence (XAI)-based dual-output system that predicts both the texture and color subtype of kidney stones using image-based descriptors. The proposed system extracts features from stone images captured in Section and Surface views and processes them through parallel branches optimized for texture and color. Texture classification is performed using an ensemble of PCA-reduced deep descriptors from InceptionV3, AlexNet, and VGG16. For color, the most effective model combined handcrafted HSV descriptors with PCA-compressed deep CNN features. These were fused into a dual-output architecture using a MultiOutputClassifier framework. The models were evaluated using five-fold cross-validation. Texture classification reached 98.67% ± 1.82 accuracy in Section and 95.33% ± 1.83 in Surface. Color classification achieved 90.67% ± 9.25 and 85.34% ± 11.93, respectively. Exact match accuracy for joint prediction was 91.4% in Section and 84.2% in Surface, indicating high coherence between the two outputs. Explainability was addressed through FullGrad visualizations and Weight ofFeature (WOF) analysis, both of which showed that the model relied on clinically meaningful image regions and that color features held slightly greater predictive influence. Compared to state-of-the-art approaches, including multi-view fusion models, the proposed method achieved a competitive performance while maintaining a modular and transparent structure. The findings validate the hypothesis that combining deep and handcrafted descriptors can enhance interpretability and, in some cases, performance. This work contributes a clinically aligned and interpretable framework for automated kidney stone classification and supports the integration of XAI into nephrological diagnostic workflows. Moreover, by providing interpretable dual predictions of color and texture, this system can support early preventive decisions aimed at reducing recurrence. Future work could explore advanced generative models to further expand diversity and clinical utility of synthetic data. Compared to state-of-the-art approaches, the proposed method achieved a competitive performance while maintaining a modular and transparent structure. The findings validate the hypothesis that combining deep and handcrafted descriptors can enhance interpretability and performance. This work contributes a clinically aligned and interpretable framework for automated kidney stone classification and supports the integration of XAI into nephrological diagnostic workflows.
- A prompt assisted image enhancement model using BERT classifier and modified LMSPEC and STTN techniques for endoscopic images(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-12) Cerriteño Magaña, Javier; Ochoa Ruiz, Gilberto; emipsanchez; Sánchez Ante, Gildardo; Alfaro Ponce, Mariel; School of Engineering and Sciences; Campus MonterreyThis document presents a research thesis for the Master in Computer Science (MCCi) degree at Tecnologico de Monterrey. The field of medical imaging, particularly in endoscopy, has seen significant advancements in image enhancement techniques aimed at improving the clarity and interpretability of captured images. Numerous models and methodologies have been developed to enhance medical images, ranging from traditional algorithms to complex deep learning frameworks. However, the effective implementation of these techniques often requires substantial expertise in computer science and image processing, which may pose a barrier for medical professionals who primarily focus on clinical practice. This thesis presents a novel prompt-assisted image enhancement model that integrates the LMSPEC and STTN techniques, augmented by BERT models equipped with added attention blocks. This innovative approach enables medical practitioners to specify desired image enhancements through natural language prompts, significantly simplifying the enhancement process. By interpreting and acting upon user-defined requests, the proposed model not only empowers clinicians with limited technical backgrounds to effectively enhance endoscopic images but also streamlines diagnostic workflows. To the best of our knowledge, this is the first dedicated prompt-assisted image enhancement model specifically tailored for medical imaging applications. Moreover, the architecture of the proposed model is designed with flexibility in mind, allowing for the seamless incorporation of future image enhancement models and techniques as they emerge. This adaptability ensures that the model remains relevant and effective as the field of medical imaging continues to evolve. The results of this research contribute to the ongoing effort to make advanced image processing technologies more accessible to medical professionals, thereby enhancing the quality of care provided to patients through improved diagnostic capabilities.
- Object detection-based surgical instrument tracking in laparoscopy videos(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-12) Guerrero Ramírez, Cuauhtemoc Alonso; Ochoa Ruiz, Gilberto; emipsanchez; González Mendoza, Miguel; Hinojosa Cervantes, Salvador Miguel; Falcón Morales, Luis Eduardo; School of Engineering and Sciences; Campus Monterrey; Medina Pérez, Miguel ÁngelMinimally invasive surgery (MIS) has transformed surgery by offering numerous advantages over traditional open surgery, such as reduced pain, minimized trauma, and faster recovery times. However, endoscopic MIS procedures remain highly operator-dependent, demanding significant skill from the surgical team to ensure a positive postoperative outcome for the patient. The implementation of computer vision techniques such as reliable surgical instru ment detection and tracking can be leveraged for applications such as intraoperative decision support, surgical navigation assistance, and surgical skill assessment, which can significantly improve patient safety. The aim of this work is to implement a Multiple Object Tracking (MOT) benchmark model for the task of surgical instrument tracking in laparoscopic videos. To this end, a new dataset is introduced, m2cai16-tool-tracking, based on the m2cai16-tool locations dataset, specifically designed for surgical instrument tracking. This dataset includes both bounding box annotations for instrument detection and unique tracking ID annotations for multi-object tracking. This work employs ByteTrack, a state-of-the-art multiple-object tracking algorithm that follows the tracking-by-detection paradigm. ByteTrack predicts tool positions and associates object detections across frames, allowing consistent tracking of each instrument. The object detection step is performed using YOLOv4, a state-of-the-art object detection model known for real-time performance. YOLOv4 is first trained on the m2cai16-tool-locations dataset to establish a baseline performance and then on the custom m2cai16-tool-tracking dataset, al lowing to compare the detection performance of the custom dataset with an existing object detection dataset. YOLOv4 generates bounding box predictions for each frame in the laparo scopic videos. The bounding box detections serve as input for the ByteTrack algorithm, which assigns unique tracking IDs to each instrument to maintain their trajectories across frames. YOLOv4 achieves robust object detection performance on the m2cai16-tool-locations dataset, obtaining a mAP50 of 0.949, a mAP75 of 0.537, and a mAP50:95 of 0.526, with a real-time inference speed of 125 fps. However, detection performance on the m2cai16-tool tracking dataset is slightly lower, with a mAP50 of 0.839, mAP75 of 0.420, and mAP50:95 of 0.439, suggesting that differences in data partitioning impact detection accuracy. This lower detection accuracy for the tracking dataset likely affects the tracking performance of ByteTrack, reflected in a MOTP of 76.4, MOTA of 56.6, IDF1 score of 22.8, and HOTAscore of 23.0. Future work could focus on improving the object detection performance to enhance tracking quality. Additionally, including appearance-based features into the track ing step could improve association accuracy of detections across frames and help maintain consistent tracking even in challenging scenarios like occlusions. Such improvements could enhance tracking reliability to support surgical tasks better.
- Exploring Anchor-Free Object Detection for Surgical Tool Detection in Laparoscopic Videos: A Comparative Study of CenterNet++ and Anchor-Based Models(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-12) Aparicio Viveros, Carlos Alfredo; Ochoa Ruiz, Gilberto; emipsanchez; Hinojosa Cervantes, Salvador Miguel; Falcón Morales, Luis Eduardo; González Mendoza, Miguel; School of Engineering and Sciences; Campus MonterreyMinimally Invasive Surgery (MIS) has transformed modern medicine, offering reduced re covery times, minimal scarring, and lower risks of infection. However, MIS procedures alsopresent unique challenges, particularly in visualizing and manipulating surgical tools within a limited field of view. As a solution, this thesis investigates anchor-free deep learning mod els for real-time surgical tool detection in laparoscopic videos, proposing CenterNet++ as apotential improvement over traditional anchor-based methods. The hypothesis guiding thiswork is that anchor-free detectors, by avoiding predefined anchor boxes, can more effectively handle the diverse shapes, sizes, and positions of surgical tools. The primary objective of this thesis is to evaluate the performance of CenterNet++ in surgical tool detection compared to popular anchor-based models, specifically Faster R-CNN and YOLOv4, using the m2cai16-tool-locations dataset. CenterNet++ is examined in dif ferent configurations—including complete and real-time optimized (Fast-CenterNet++) ver sions—and tested against Faster R-CNN and YOLOv4 to assess trade-offs in accuracy and efficiency. Experimental results demonstrate that while CenterNet++ achieves high precision, particularly in scenarios requiring meticulous localization, its inference speed is significantly slower than YOLOv4, which attained real-time speeds at 128 FPS. CenterNet++’s unique keypoint refinement mechanism, though beneficial for localization, impacts its computational efficiency, highlighting areas for further optimization. To bridge this gap, several architectural improvements are proposed based on YOLOv4’s streamlined design. These include integrating modules like Spatial Pyramid Pooling (SPP) and Path Aggregation Network (PANet), along with reducing input resolution in the Fast CenterNet++ configuration. Additionally, future work is suggested to explore CenterNet++ in larger, more complex datasets and to develop semi-supervised learning approaches that could mitigate the limitations of annotated surgical datasets. In conclusion, this thesis contributes a comprehensive evaluation of anchor-free models for surgical tool detection, providing a foundation for further advancements in real-time, high precision object detection for surgical assistance. The findings underscore the potential of anchor-free models, such as CenterNet++, to meet the evolving demands of MIS with targeted architectural adaptations.
- Generation and evaluation of synthetic kidney stones images generated by diffusion models using limited data(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024) González Pérez, Ruben; Ochoa Ruiz, Gilberto; emimmayorquin; Sánchez Ante, Gildardo; Daul, Christian; Hinojosa Cervantes, Salvador Miguel; School of Engineering and Sciences; Campus Monterrey; Falcón Morales, Luis EduardoOne of the diseases that affects men and women around the world is kidney stones. Kidney stones are the accumulation of minerals inside the kidneys and can cause severe pain or problems with the urinary system. There are many different types of kidney stones and it is very important to identify and classify them to know which is the best treatment for it and to avoid relapses. Currently, there are very few specialists who can perform this analysis, since it is a complicated process that requires a lot of experience and the methods that are used currently to do this process can take a long time. Numerous studies have shown that deep learning methods hold great promise in automating the classification of kidney stones. These advanced algorithms leverage neural networks to analyze and interpret complex medical imaging data with high precision. By training on large datasets of annotated kidney stone images, deep learning models can learn to identify and classify different types of stones, such as calcium oxalate, uric acid, and struvite, with remarkable precision. Research has demonstrated that these models can achieve performance levels comparable to, and sometimes exceeding those of experienced radiologists. The ability of deep learning methods to process large amounts of data quickly and consistently makes them particularly valuable in clinical settings, where timely and accurate diagnosis is crucial. However, data scarcity represents a big challenge in using deep learning methods for kidney stone classification because Deep learning algorithms require large and diverse datasets to train effectively, capturing the wide variability in stone appearances and characteristics seen in different patients, but acquiring such extensive datasets in the medical field is difficult due to privacy concerns, the labor-intensive process of annotating medical images, and the relatively low prevalence of certain types of kidney stones. The objective of this study is to solve the problem of missing data through data augmentation using the SinDDM model, which is a diffusion model capable of generating synthetic images from a single training image. To evaluate the generated synthetic images, a case study was also carried out to compare the performance of a classifier when using generated images and when using only real images. The results obtained indicate an 11% improvement in the accuracy of the classifier, thus demonstrating that the proposed method is efficient in solving the data scarcity problem.
- Improved Kidney Stone Recognition Through Attention and Feature Fusion Strategies(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2023-05) Villalvazo Avila, Elias Alejandro; Ochoa Ruiz, Gilberto; emimmayorquin; Gonzalez Mendoza, Miguel; Hinojosa Cervantes, Miguel Salvador; Daul, Christian; Campus Estado de MéxicoUrolithiasis is the second most common kidney disease and is expected to increase its incidence rate in upcoming years. This disease refers to the formation of crystalline accretions from minerals dissolved in urine in the urinary tract (kidneys, ureters, and bladder) that cannot be expelled. Identifying the kidney stone type is considered crucial by many practitioners because it allows them to prescribe a proper treatment to eliminate kidney stones and most importantly, to avoid future relapses. For diagnostic purposes, the morpho-consitutional analysis (MCA) is the reference for ex-vivo stone characterisation. This analsysis consists of two complementary analyses. First, the visual examination under the microscope of the stone to obtain a description of the crystalline structure at different regions of the stone. Second, a FTIR that provides the biochemical composition of the kidney stone. The current clinical practices for removing kidney stones make increasing use of laser techniques for fragmenting the stone, such as ”dusting”, that reduces intervention time and the trauma for the patient, at the expense of losing important information about the morphology of the stone, which could lead to an incomplete or incorrect diagnosis. To overcome this issue, few experts visually identify the stone type on screen during the procedure. This visual kidney stone recognition by urologists is operator dependent and a great deal of experience is required due to the high similarities between classes. Therefore, AI techniques assessing endoscopic images could lead to automated and operator-independent in-vivo recognition. It has been proved that on ex-vivo data, with very controlled scenes and image acquisition conditions, kidney stones classification is indeed feasible. In the literature it has also been shown that classification on-the-vivo is also feasible using deep-learning architectures. This thesis presents a deep learning method for the extraction and fusion of information relating to kidney stone fragments acquired from different viewpoints of the endoscope. Surface and section fragment images are jointly used during the training of the classifier to improve the discrimination power of the features by adding attention layers at the end of each convolutional block. This approach is specifically designed to mimic the morpho-constitutional analysis performed in ex-vivo by biologists to visually identify kidney stones by inspecting both views. The addition of attention mechanisms to the backbone improved the results of single-view extraction backbones by 4% on average. Moreover, in comparison to the state-of the-art, the fusion of the deep features improved the overall results by up to 11% in terms of kidney stone classification accuracy.
- Lights, camera, and domain shift: using superpixels for domain generalization in image segmentation for multimodal endoscopies(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2023-05) Martínez García Peña, Rafael; Ochoa Ruiz, Gilberto; puemcuervo, emipsanchez; Falcón Morales, Luis Eduardo; Gónzales Mendoza, Miguel; School of Engineering and Sciences; Campus Monterrey; Ali, SharibDeep Learning models have made great advancements in image processing. Their ability to identify key parts of images and provide fast and accurate segmentation has been proven and used in many fields, such as city navigation and object recognition. However, there is one field that is both in need of the extra information that computers can provide and has proven elusive for the goals of robustness and accuracy: Medicine. In the medical field, limitations in the amount of data and in the variation introduced by factors such as differences in instrumentation introduce a grave threat to the accuracy of a model known as domain shift. Domain shift occurs when we train with data that has a set of characteristics that is not wholly representative of the entire set of data a task encompasses. When it is present, models that have no tools to deal with it can observe a degradation to their accuracy to such degree that they can be transformed from usable to useless. To better explore this topic, we discuss two techniques: Domain adaptation, where we find how to make a model better at predicting for specific domain of data inside a task, and Domain generalization, where we find how to make a model better at predicting data for any domain inside a task. In addition, we discuss several image segmentation models that have shown good results for medical tasks: U-Net, Attention U-Net, DeepLab, Efficient U-Net, and EndoUDA. Following this exploration, we propose a solution model based on a domain generalization technique: Patch-based consistency. We use a superpixel generator known as SLIC (Simple Linear Iterative Clustering) to provide low-level, domain-agnostic information to different models in order to encourage our networks to learn more global features. This framework, which we refer to as SUPRA (SUPeRpixel Augmented), is used in tandem with U-Net, Attention U-Net, and Efficient U-Net in an effort to improve results in endoscopies where light modalities are switched: Something commonly seen in lesion detection tasks (particularly in Barrett's Esophagus and Polyp detection). We find that the best of these models, SUPRA-UNet, shows significant qualities that make it a better choice than unaugmented networks for lesion detection: Not only does it provide less noisy and smoother predictions, but it outperforms other networks by over 20% IoU versus the best results (U-Net) in a target domain that presents significant lighting differences from the training set.
- Detection and classification of gastrointestinal diseases using deep learning techniques(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2022-11-30) Chavarrias Solano, Pedro Esteban; OCHOA RUIZ, GILBERTO; 3016604; Ochoa Ruiz, Gilberto; puemcuervo, emipsanchez; Sanchez Ante, Gildardo; Hinojosa Cervantes, Salvador Miguel; School of Engineering and Sciences; Campus Monterrey; Ali, SharibThis document presents a research thesis for the Master in Computer Science (MCCi) degree at Tecnologico de Monterrey. Cancer is a pathological situation in which old or abnormal cells do not die when they should. Even though there are different cancer types, the incidence of colorectal cancer position it as the third most common one worldwide. Endoscopy is the primary diagnostic tool used to manage gastrointestinal (GI) tract malignancies, however, it is a time consuming and subjective process based on the experience of the clinician. Previous work has been done leveraging the use of artificial intelligence methods for polyps detection, instrument tracking and segmentation of gastric ulcers. This work is focused on the detec- tion and classification of gastrointestinal diseases. This thesis proposal seeks to implement a knowledge distillation framework with class-aware loss for endoscopic disease detection in the upper and lower part of the gastrointestinal tract. Relevant features will be extracted from endoscopic images to feed and train a deep learning-based object detection model. The method is evaluated using standard computer vision metrics: IoU and mAP25, mAP50, mAP75, mAP25:75. This proposal outperforms state-of-the-art methods and its vanilla version, which means that it has the potential to be an auxiliary quantitative tool to reduce high-missed de- tection rates in endoscopic procedures.
- Novel metric-learning methods for generalizable and discriminative few-shot image classification(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2021-12-09) Méndez Ruiz, Mauricio; OCHOA RUIZ, GILBERTO; 352103; Ochoa Ruiz, Gilberto; puelquio/mscuervo; Chang Fernández, Leonardo; Méndez Vázquez, Andrés; School of Engineering and Sciences; Campus MonterreyFew-shot learning (FSL) is a challenging and relatively new technique that specializes in problems where we have little amount of data. The goal of these methods is to classify categories that have not been seen before with just a handful of labeled samples. Recent works based on metric-learning approaches benefit from the meta-learning process in which we have episodic tasks conformed by a support set (training) and a query set (test), and the objective is to learn a similarity comparison metric between those sets. Metric learning methods have demonstrated that simple models can achieve good performance. However, the feature space learned by a given metric learning approach may not exploit the information given by a specific few-shot task. Due to the lack of data, the learning process of the embedding network becomes an important part for these models to take better advantage of the similarity metric on a few-shot task. The contributions of the present thesis are three-fold. First, we explore the use of dimension reduction techniques as a way to find significant features in the few-shot task, which allows a better classification. We measure the performance of the reduced features by assigning a score based on the intra-class and inter-class distance, and select the best feature reduction method in which instances of different classes are far away and instances of the same class are close. This method outperforms the metric learning baselines in the miniImageNet dataset by around 2% in accuracy performance. Further on, we propose two different distance-based loss functions for few-shot classification. One is inspired on the triplet-loss function while the other evaluates the embedding vectors from a task using the concepts of intra-class and inter-class distance among the few samples. Extensive experimental results on the miniImagenNet dataset show an increase on the accuracy performance compared with other metric-based FSL methods by a margin of 2%. Lastly, we evaluate the generalization ca- pabilities of meta-learning based FSL on two real-life medical datasets with small availability of data. It has been repeatedly showed that deep learning (DL) methods trained on a dataset don’t generalize well to datasets from other domains or even to similar datasets, due to the data distribution shifts. We propose the use of a meta-learning based FSL approach to alleviate these problems by demonstrating, using two datasets of kidney stones samples acquired with different endoscopes and different acquisition conditions, that such methods are indeed capable of handling domain shifts. Where deep learning based methods fail to generalize to instances of the same class but from different data distributions, we prove that FSL is capable of generalizing without a large decrease on performance. This method is capable of doing remarkably well even under the very limited data conditions, attaining an accuracy of 74.38% and 88.52% in the 5-way 5-shot and 5-way 20-shot settings respectively, while traditional DL methods attained an accuracy of 45% in the same data.
- Characterization of jet fire flame temperature zones using a deep learning-based segmentation approach(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2021-12-02) Pérez Guerrero, Carmina; OCHOA RUIZ, GILBERTO; 352103; Ochoa Ruiz, Gilberto; puemcuervo; González Mendoza, Miguel; Mata Miquel, Christian; School of Engineering and Sciences; Campus Monterrey; Palacios Rosas, AdrianaJet fires are relatively small and have the least severe effects among the diverse fire accidents that can occur in industrial plants; however, they are usually involved in a process known as domino effect, that leads to more severe events, such as explosions or the initiation of another fire, making the analysis of such fires an important part of risk analysis. One such analysis would be the segmentation of different radiation zones within the flame, therefore this thesis presents an exploratory research regarding several traditional computer vision and Deep Learning segmentation approaches to solve this specific problem. A data set of propane jet fires is used to train and evaluate the different approaches. Different metrics are correlated to a manual ranking performed by experts to make an evaluation that closely resembles the expert’s criteria. Additionally, given the difference in the distribution of the zones and background of the images, different loss functions, that seek to alleviate data imbalance, are explored. The Hausdorff Distance and Adjusted Rand Index were the metrics with the highest correlation and the best results were obtained from training with a Weighted Cross-Entropy Loss. The best performing models were found to be the UNet architecture, along with its recent variations, Attention UNet and UNet++. These models are then used to segment a group of vertical jet flames of varying pipe outlet diameters to extract their main geometrical characteristics. Attention UNet obtained the best general performance in the approximation of both height and area of the flames, while also showing a statistically significant difference between it and UNet++. UNet obtained the best overall performance for the approximation of the lift-off distances; however, there is not enough data to prove a statistically significant difference between UNet and its two variations. The only instance where UNet++ outperformed the other models, was while obtaining the lift-off distances of the jet flames with 0.01275 m pipe outlet diameter. In general, the explored models show good agreement between the experimental and predicted values for relatively large turbulent propane jet flames, released in sonic and subsonic regimes; thus, making these radiation zones segmentation models, a suitable approach for different jet flame risk management scenarios.

