Ciencias Exactas y Ciencias de la Salud

Permanent URI for this collectionhttps://hdl.handle.net/11285/551039

Pertenecen a esta colección Tesis y Trabajos de grado de las Maestrías correspondientes a las Escuelas de Ingeniería y Ciencias así como a Medicina y Ciencias de la Salud.

Browse

Search Results

Now showing 1 - 3 of 3
  • Tesis de maestría / master thesis
    Object detection-based surgical instrument tracking in laparoscopy videos
    (Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-12) Guerrero Ramírez, Cuauhtemoc Alonso; Ochoa Ruiz, Gilberto; emipsanchez; González Mendoza, Miguel; Hinojosa Cervantes, Salvador Miguel; Falcón Morales, Luis Eduardo; School of Engineering and Sciences; Campus Monterrey; Medina Pérez, Miguel Ángel
    Minimally invasive surgery (MIS) has transformed surgery by offering numerous advantages over traditional open surgery, such as reduced pain, minimized trauma, and faster recovery times. However, endoscopic MIS procedures remain highly operator-dependent, demanding significant skill from the surgical team to ensure a positive postoperative outcome for the patient. The implementation of computer vision techniques such as reliable surgical instru ment detection and tracking can be leveraged for applications such as intraoperative decision support, surgical navigation assistance, and surgical skill assessment, which can significantly improve patient safety. The aim of this work is to implement a Multiple Object Tracking (MOT) benchmark model for the task of surgical instrument tracking in laparoscopic videos. To this end, a new dataset is introduced, m2cai16-tool-tracking, based on the m2cai16-tool locations dataset, specifically designed for surgical instrument tracking. This dataset includes both bounding box annotations for instrument detection and unique tracking ID annotations for multi-object tracking. This work employs ByteTrack, a state-of-the-art multiple-object tracking algorithm that follows the tracking-by-detection paradigm. ByteTrack predicts tool positions and associates object detections across frames, allowing consistent tracking of each instrument. The object detection step is performed using YOLOv4, a state-of-the-art object detection model known for real-time performance. YOLOv4 is first trained on the m2cai16-tool-locations dataset to establish a baseline performance and then on the custom m2cai16-tool-tracking dataset, al lowing to compare the detection performance of the custom dataset with an existing object detection dataset. YOLOv4 generates bounding box predictions for each frame in the laparo scopic videos. The bounding box detections serve as input for the ByteTrack algorithm, which assigns unique tracking IDs to each instrument to maintain their trajectories across frames. YOLOv4 achieves robust object detection performance on the m2cai16-tool-locations dataset, obtaining a mAP50 of 0.949, a mAP75 of 0.537, and a mAP50:95 of 0.526, with a real-time inference speed of 125 fps. However, detection performance on the m2cai16-tool tracking dataset is slightly lower, with a mAP50 of 0.839, mAP75 of 0.420, and mAP50:95 of 0.439, suggesting that differences in data partitioning impact detection accuracy. This lower detection accuracy for the tracking dataset likely affects the tracking performance of ByteTrack, reflected in a MOTP of 76.4, MOTA of 56.6, IDF1 score of 22.8, and HOTAscore of 23.0. Future work could focus on improving the object detection performance to enhance tracking quality. Additionally, including appearance-based features into the track ing step could improve association accuracy of detections across frames and help maintain consistent tracking even in challenging scenarios like occlusions. Such improvements could enhance tracking reliability to support surgical tasks better.
  • Tesis de maestría / master thesis
    Improved Kidney Stone Recognition Through Attention and Feature Fusion Strategies
    (Instituto Tecnológico y de Estudios Superiores de Monterrey, 2023-05) Villalvazo Avila, Elias Alejandro; Ochoa Ruiz, Gilberto; emimmayorquin; Gonzalez Mendoza, Miguel; Hinojosa Cervantes, Miguel Salvador; Daul, Christian; Campus Estado de México
    Urolithiasis is the second most common kidney disease and is expected to increase its incidence rate in upcoming years. This disease refers to the formation of crystalline accretions from minerals dissolved in urine in the urinary tract (kidneys, ureters, and bladder) that cannot be expelled. Identifying the kidney stone type is considered crucial by many practitioners because it allows them to prescribe a proper treatment to eliminate kidney stones and most importantly, to avoid future relapses. For diagnostic purposes, the morpho-consitutional analysis (MCA) is the reference for ex-vivo stone characterisation. This analsysis consists of two complementary analyses. First, the visual examination under the microscope of the stone to obtain a description of the crystalline structure at different regions of the stone. Second, a FTIR that provides the biochemical composition of the kidney stone. The current clinical practices for removing kidney stones make increasing use of laser techniques for fragmenting the stone, such as ”dusting”, that reduces intervention time and the trauma for the patient, at the expense of losing important information about the morphology of the stone, which could lead to an incomplete or incorrect diagnosis. To overcome this issue, few experts visually identify the stone type on screen during the procedure. This visual kidney stone recognition by urologists is operator dependent and a great deal of experience is required due to the high similarities between classes. Therefore, AI techniques assessing endoscopic images could lead to automated and operator-independent in-vivo recognition. It has been proved that on ex-vivo data, with very controlled scenes and image acquisition conditions, kidney stones classification is indeed feasible. In the literature it has also been shown that classification on-the-vivo is also feasible using deep-learning architectures. This thesis presents a deep learning method for the extraction and fusion of information relating to kidney stone fragments acquired from different viewpoints of the endoscope. Surface and section fragment images are jointly used during the training of the classifier to improve the discrimination power of the features by adding attention layers at the end of each convolutional block. This approach is specifically designed to mimic the morpho-constitutional analysis performed in ex-vivo by biologists to visually identify kidney stones by inspecting both views. The addition of attention mechanisms to the backbone improved the results of single-view extraction backbones by 4% on average. Moreover, in comparison to the state-of the-art, the fusion of the deep features improved the overall results by up to 11% in terms of kidney stone classification accuracy.
  • Trabajo de grado, maestría / master degree work
    Deep learning for visible-infrared image fusion and semantic segmentation of wildfire imagery
    (Instituto Tecnológico y de Estudios Superiores de Monterrey, 2021-11-23) Ciprián Sánchez, Jorge Francisco; Ochoa Ruiz, Gilberto; puemcuervo, emipsanchez; Martínez Carranza, José; Falcón Morales, Luis Eduardo; School of Engineering and Sciences; Campus Estado de México; Rossi, Lucile
    Wildfires stand as one of the most relevant natural disasters worldwide, particularly more so due to climate change and its impact on various societal and environmental levels. In this regard, a significant amount of research has been done to address this issue, deploying a wide variety of technologies and following a multi-disciplinary approach. Notably, computer vision has played a fundamental role in this regard. It can be used to extract and combine information from several imaging modalities regarding fire detection, characterization, and wildfire spread forecasting. In recent years there has been work on Deep Learning (DL)-based fire segmentation, showing promising results. However, it is currently unclear whether the architecture of a model, its loss function, or the image type employed (visible, infrared, or fused) has the most impact on the fire segmentation results. In the field of visible-infrared image fusion, there is a growing interest in DL-based image fusion techniques due to their reduced complexity; however, most DL-based image fusion methods have not been evaluated in the domain of fire imagery. In the present thesis, I select three state-of-the-art (SOTA) DL-based image fusion techniques, assess their performance for the specific task of fire image fusion, and compare the performance of these methods on selected metrics. I also present an extension to one of the said methods, that I called FIRe-GAN, that improves the generation of artificial infrared and fused images. I then evaluate different combinations of SOTA DL architectures, loss functions, and types of images to identify the parameters most relevant to improve the segmentation results. I benchmark them to identify the top-performing ones and compare the best one to traditional fire segmentation techniques. Finally, I evaluate if the addition of attention modules on the best-performing architecture can further improve the segmentation results. To the best of my knowledge, this is the first work that evaluates the impact of the architecture, loss function, and image type in the performance of DL-based wildfire segmentation models and assesses the applicability of DL-based image fusion methods on fire images, proposing a DL model for visible-infrared image fusion optimized for fire imagery.
En caso de no especificar algo distinto, estos materiales son compartidos bajo los siguientes términos: Atribución-No comercial-No derivadas CC BY-NC-ND http://www.creativecommons.mx/#licencias
logo

El usuario tiene la obligación de utilizar los servicios y contenidos proporcionados por la Universidad, en particular, los impresos y recursos electrónicos, de conformidad con la legislación vigente y los principios de buena fe y en general usos aceptados, sin contravenir con su realización el orden público, especialmente, en el caso en que, para el adecuado desempeño de su actividad, necesita reproducir, distribuir, comunicar y/o poner a disposición, fragmentos de obras impresas o susceptibles de estar en formato analógico o digital, ya sea en soporte papel o electrónico. Ley 23/2006, de 7 de julio, por la que se modifica el texto revisado de la Ley de Propiedad Intelectual, aprobado

DSpace software copyright © 2002-2026

Licencia