Ciencias Exactas y Ciencias de la Salud
Permanent URI for this collectionhttps://hdl.handle.net/11285/551039
Pertenecen a esta colección Tesis y Trabajos de grado de las Maestrías correspondientes a las Escuelas de Ingeniería y Ciencias así como a Medicina y Ciencias de la Salud.
Browse
Search Results
- Object detection-based surgical instrument tracking in laparoscopy videos(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-12) Guerrero Ramírez, Cuauhtemoc Alonso; Ochoa Ruiz, Gilberto; emipsanchez; González Mendoza, Miguel; Hinojosa Cervantes, Salvador Miguel; Falcón Morales, Luis Eduardo; School of Engineering and Sciences; Campus Monterrey; Medina Pérez, Miguel ÁngelMinimally invasive surgery (MIS) has transformed surgery by offering numerous advantages over traditional open surgery, such as reduced pain, minimized trauma, and faster recovery times. However, endoscopic MIS procedures remain highly operator-dependent, demanding significant skill from the surgical team to ensure a positive postoperative outcome for the patient. The implementation of computer vision techniques such as reliable surgical instru ment detection and tracking can be leveraged for applications such as intraoperative decision support, surgical navigation assistance, and surgical skill assessment, which can significantly improve patient safety. The aim of this work is to implement a Multiple Object Tracking (MOT) benchmark model for the task of surgical instrument tracking in laparoscopic videos. To this end, a new dataset is introduced, m2cai16-tool-tracking, based on the m2cai16-tool locations dataset, specifically designed for surgical instrument tracking. This dataset includes both bounding box annotations for instrument detection and unique tracking ID annotations for multi-object tracking. This work employs ByteTrack, a state-of-the-art multiple-object tracking algorithm that follows the tracking-by-detection paradigm. ByteTrack predicts tool positions and associates object detections across frames, allowing consistent tracking of each instrument. The object detection step is performed using YOLOv4, a state-of-the-art object detection model known for real-time performance. YOLOv4 is first trained on the m2cai16-tool-locations dataset to establish a baseline performance and then on the custom m2cai16-tool-tracking dataset, al lowing to compare the detection performance of the custom dataset with an existing object detection dataset. YOLOv4 generates bounding box predictions for each frame in the laparo scopic videos. The bounding box detections serve as input for the ByteTrack algorithm, which assigns unique tracking IDs to each instrument to maintain their trajectories across frames. YOLOv4 achieves robust object detection performance on the m2cai16-tool-locations dataset, obtaining a mAP50 of 0.949, a mAP75 of 0.537, and a mAP50:95 of 0.526, with a real-time inference speed of 125 fps. However, detection performance on the m2cai16-tool tracking dataset is slightly lower, with a mAP50 of 0.839, mAP75 of 0.420, and mAP50:95 of 0.439, suggesting that differences in data partitioning impact detection accuracy. This lower detection accuracy for the tracking dataset likely affects the tracking performance of ByteTrack, reflected in a MOTP of 76.4, MOTA of 56.6, IDF1 score of 22.8, and HOTAscore of 23.0. Future work could focus on improving the object detection performance to enhance tracking quality. Additionally, including appearance-based features into the track ing step could improve association accuracy of detections across frames and help maintain consistent tracking even in challenging scenarios like occlusions. Such improvements could enhance tracking reliability to support surgical tasks better.
- Exploring Anchor-Free Object Detection for Surgical Tool Detection in Laparoscopic Videos: A Comparative Study of CenterNet++ and Anchor-Based Models(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-12) Aparicio Viveros, Carlos Alfredo; Ochoa Ruiz, Gilberto; emipsanchez; Hinojosa Cervantes, Salvador Miguel; Falcón Morales, Luis Eduardo; González Mendoza, Miguel; School of Engineering and Sciences; Campus MonterreyMinimally Invasive Surgery (MIS) has transformed modern medicine, offering reduced re covery times, minimal scarring, and lower risks of infection. However, MIS procedures alsopresent unique challenges, particularly in visualizing and manipulating surgical tools within a limited field of view. As a solution, this thesis investigates anchor-free deep learning mod els for real-time surgical tool detection in laparoscopic videos, proposing CenterNet++ as apotential improvement over traditional anchor-based methods. The hypothesis guiding thiswork is that anchor-free detectors, by avoiding predefined anchor boxes, can more effectively handle the diverse shapes, sizes, and positions of surgical tools. The primary objective of this thesis is to evaluate the performance of CenterNet++ in surgical tool detection compared to popular anchor-based models, specifically Faster R-CNN and YOLOv4, using the m2cai16-tool-locations dataset. CenterNet++ is examined in dif ferent configurations—including complete and real-time optimized (Fast-CenterNet++) ver sions—and tested against Faster R-CNN and YOLOv4 to assess trade-offs in accuracy and efficiency. Experimental results demonstrate that while CenterNet++ achieves high precision, particularly in scenarios requiring meticulous localization, its inference speed is significantly slower than YOLOv4, which attained real-time speeds at 128 FPS. CenterNet++’s unique keypoint refinement mechanism, though beneficial for localization, impacts its computational efficiency, highlighting areas for further optimization. To bridge this gap, several architectural improvements are proposed based on YOLOv4’s streamlined design. These include integrating modules like Spatial Pyramid Pooling (SPP) and Path Aggregation Network (PANet), along with reducing input resolution in the Fast CenterNet++ configuration. Additionally, future work is suggested to explore CenterNet++ in larger, more complex datasets and to develop semi-supervised learning approaches that could mitigate the limitations of annotated surgical datasets. In conclusion, this thesis contributes a comprehensive evaluation of anchor-free models for surgical tool detection, providing a foundation for further advancements in real-time, high precision object detection for surgical assistance. The findings underscore the potential of anchor-free models, such as CenterNet++, to meet the evolving demands of MIS with targeted architectural adaptations.
- EfficientDet and fuzzy logic for an emergency brake driver assistant system based on traffic lights using a Jetson TX2 and a ZED stereo camera(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2022-04) García Escalante, Andrés Ricardo; FUENTES AGUILAR, RITA QUETZIQUEL; 229297; Fuentes Aguilar, Rita Quetziquel; puelquio, emipsanchez; Terashima Marín, Hugo; Falcón Morales, Luis Eduardo; Álvarez González, Rodolfo Rubén; Escuela de Ingeniería y Ciencias; Campus Monterrey; Carbajal, Oscar Eleno EspinosaA study developed by the University of West Virginia analyzed the vehicle collisions, these occur due to the slow reaction time (RT) of humans. The study involved human RT under specific conditions, they found out that fully aware drivers have an estimated RT between 0.70 to 0.075 seconds, unexpected but normal situations like a lead car brake’s lights, is 1.25 seconds, and for surprising events is estimated to be around 1.50 seconds. Therefore, the presented work provides a solution to implement an Advanced Driver Assistant System (ADAS) level 1 called Emergency Brake Driver Assistant System based on Traffic Lights (EBDASTL) using a Jetson TX2 and a ZED Stereo camera to detect Traffic Light States (TLSs), estimate the distance to a Traffic Light (TL), and perform a brake decision based on the TLS and TLD that can have a better response time than human RT in surprising events. The main contribution of this research project is the implementation of a single ADAS that has three stages. The Traffic Light State Detection Model (TLSDM) stage using EfficientDet D0. The Traffic Light Distance (TLD) stage using a ZED Stereo camera, and the Traffic Light Decision-Making (TLDM) stage using Fuzzy Logic. Up to date there is not a related work that have the three stages. The second main contribution is the on Road test performed in Queretaro Mexico, where all the components of the EBDASTL have been mounted in a car and tested in a real-world scenario. The experiment consisted of detecting red and green TLSs at six different positions (5, 7, 9, 11, 13, and 15 meters from the TL). The TLSDM achieved a mean Average Precision of 96% for distances lower than 13 meters, and 89.50% for 15 meters. The TLD achieved an overall Root Mean Squared Error (RMSE) of 0.84 for all distances. The TLDM provided a smooth brake profile. Finally, the EBDASTL provided a response time of 0.23 seconds.