Ciencias Exactas y Ciencias de la Salud

Permanent URI for this collectionhttps://hdl.handle.net/11285/551039

Pertenecen a esta colección Tesis y Trabajos de grado de las Maestrías correspondientes a las Escuelas de Ingeniería y Ciencias así como a Medicina y Ciencias de la Salud.

Browse

Search Results

Now showing 1 - 5 of 5

Exploring Anchor-Free Object Detection for Surgical Tool Detection in Laparoscopic Videos: A Comparative Study of CenterNet++ and Anchor-Based Models
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-12) Aparicio Viveros, Carlos Alfredo; Ochoa Ruiz, Gilberto; emipsanchez; Hinojosa Cervantes, Salvador Miguel; Falcón Morales, Luis Eduardo; González Mendoza, Miguel; School of Engineering and Sciences; Campus Monterrey
Minimally Invasive Surgery (MIS) has transformed modern medicine, offering reduced re covery times, minimal scarring, and lower risks of infection. However, MIS procedures alsopresent unique challenges, particularly in visualizing and manipulating surgical tools within a limited field of view. As a solution, this thesis investigates anchor-free deep learning mod els for real-time surgical tool detection in laparoscopic videos, proposing CenterNet++ as apotential improvement over traditional anchor-based methods. The hypothesis guiding thiswork is that anchor-free detectors, by avoiding predefined anchor boxes, can more effectively handle the diverse shapes, sizes, and positions of surgical tools. The primary objective of this thesis is to evaluate the performance of CenterNet++ in surgical tool detection compared to popular anchor-based models, specifically Faster R-CNN and YOLOv4, using the m2cai16-tool-locations dataset. CenterNet++ is examined in dif ferent configurations—including complete and real-time optimized (Fast-CenterNet++) ver sions—and tested against Faster R-CNN and YOLOv4 to assess trade-offs in accuracy and efficiency. Experimental results demonstrate that while CenterNet++ achieves high precision, particularly in scenarios requiring meticulous localization, its inference speed is significantly slower than YOLOv4, which attained real-time speeds at 128 FPS. CenterNet++’s unique keypoint refinement mechanism, though beneficial for localization, impacts its computational efficiency, highlighting areas for further optimization. To bridge this gap, several architectural improvements are proposed based on YOLOv4’s streamlined design. These include integrating modules like Spatial Pyramid Pooling (SPP) and Path Aggregation Network (PANet), along with reducing input resolution in the Fast CenterNet++ configuration. Additionally, future work is suggested to explore CenterNet++ in larger, more complex datasets and to develop semi-supervised learning approaches that could mitigate the limitations of annotated surgical datasets. In conclusion, this thesis contributes a comprehensive evaluation of anchor-free models for surgical tool detection, providing a foundation for further advancements in real-time, high precision object detection for surgical assistance. The findings underscore the potential of anchor-free models, such as CenterNet++, to meet the evolving demands of MIS with targeted architectural adaptations.
Contextual information for Person Re-Identification on outdoor environements.
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2021-06) Garnica López, Luis Alberto; Chang Fernández, Leonardo; 345979; Chang Fernández, Leonardo; emipsanchez; Pérez Suárez, Airel; Gutiérrez Rodríguez, Andrés Eduardo; School of Engineering and Sciences; Campus Monterrey; González Mendoza, Miguel
Person Re-Identification (ReID) is obtaining good results and is getting closer and closer to being ready for implementation in production scenarios. However, there are still improvements to be performed, and usually the performance of this task is affected by illumination or natural elements that could distort their images such as fog or dust, when the task is implemented in outdoor environments. In this work, we introduce a novel proposal for the inclusion of contextual information in a ReID re-ranking approach, to help to improve the effectiveness of this task in surveillance systems. Most of the previous research in this field usually make use only of the visual data contained in the images processed by ReID. Even the approaches that make use of some sort of context, is normally annotated context within the scope of the image itself, or the exploration of the relationships between the different images where the Id’s are found. We understand that there is a lot of contextual information available in these scenarios that are not being included and that might help to reduce the impact of these situations on the performance of the task. In the present document, we perform a complete analysis of the effect of the inclusion of this contextual information with the normally produced embeddings generated by several ReID models, processing it through an architecture inspired in siamese neural networks, but with triplet loss. The neural network was trained using a novel dataset developed specifically for this task, which is annotated including this extra information. The dataset is composed of 34156 images from 3 different cameras of 501 labeled identities. Along with this data, each image includes 12 extra features with its specific contextual information. This dataset of images was processed previously using three different ReID models to ensure that the results obtained when the information is included, are independent of the ReID approach taken as the base, which are: Triplet Network (TriNet), Multiple Granularity Network (MGN), and Multi-Level Factorization Net (MLFN). Each one produced 2048-dimensional embeddings. All of our proposed experiments achieved an improvement with respect to the original mAP generated from these three networks. Going from 86.53 to 94.9, from 84.94 to 93.11, and from 95.35 to 95.93 respectively for our dataset.
TYolov5: A Temporal Yolov5 detector based on quasi-recurrent neural networks for real-time handgun detection in video
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2020-12-01) Duran Vega, Mario Alberto; GONZALEZ MENDOZA, MIGUEL; 123361; González Mendoza, Miguel; puemcuervo; Ochoa Ruiz, Gilberto; Morales González Quevedo, Annette; Sánchez Castellanos, Héctor Manuel; School of Engineering and Science; Campus Monterrey; Chang Fernández, Leonardo
Timely handgun detection is a crucial problem to improve public safety; nevertheless, the effectiveness of many surveillance systems, still depend of finite human attention. Much of the previous research on handgun detection is based on static image detectors, leaving aside valuable temporal information that could be used to improve object detection in videos. To improve the performance of surveillance systems, a real-time temporal handgun detection system should be built. Using Temporal Yolov5, an architecture based in Quasi-Recurrent Neural Networks, temporal information is extracted from video to improve the results of the handgun detection. Moreover, two publicity available datasets are proposed, labeled with hands, guns, and phones. One containing 2199 static images to train static detectors, and another with 5960 frames of videos to train temporal modules. Additionally, we explore two temporal data augmentation techniques based in Mosaic and Mixup. The resulting systems are three real-time architectures: one focused in reducing inference with a mAP(50:95) of 56.1, another in having a good balance between inference and accuracy with a mAP(50:95) of 59.4, and a last one specialized in accuracy with a mAP(50:95) of 60.6. Temporal Yolov5 achieves real-time detection and take advantage of temporal features contained in videos to perform better than Yolov5 in our temporal dataset. Making TYolov5 suitable for real-world applications.
Histopathological image classification using deep learning
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2020-11) Arredondo Padilla, Braulio; Martínez Ledesma, Juan Emmanuel; emipsanchez; Tamez Peña, José Gerardo; Santos Díaz, Alejandro; Martínez Torteya, Antonio; Escuela de Ingeniería y ciencias; Campus Monterrey
This thesis presents a study of digital pathology classification using and combining several techniques of machine learning and deep learning. Cancer is one of the most common causes of death around the world. One of the main complications of the disease is the prediction in the final stage. Nowadays there are many different studies to obtain a correct diagnosis on time. Some of these studies are tissue biopsies. These samples are analyzed by a pathologist, which must observe pixel by pixel a whole image of high dimensions to give a diagnostic of the disease, including stage and class. This activity takes weeks, even for experts, because usually several samples are extracted from a single patient. To speed up and facilitate this process, several models have been developed for digital pathology classification. With these models, it is easier to discard many patient slides than the traditional method, then, the main activity for a pathologist is to confirm a diagnosis with the most relevant or complicated sample. The downside of these models is that most of them are based on deep learning, a technique that is well known for its great performance, but also for its high requirements like graphic processors and memory resources. Consequently, we performed a complete analysis of several convolutional neural networks used in different ways to compare outcomes and efficiency. In addition, we include techniques such as recurrent neural networks and machine learning. Several models of deep learning and machine learning are presented as alternatives to convolutional neural networks, including 5 computer vision techniques. The main objective of our project is to perform a real alternative capable to achieve similar outcomes to deep learning with limited resources. The experiments were successful, including a real alternative for deep learning for the classification of 3 different types of cancer with an area under the curve higher than 90%.
Detection of Violent Behavior in Open Environments Using Pose Estimation and Neural Networks
(Instituto Tecnológico y de Estudios Superiores de Monterrey) Chong Loo, Kevin Brian Kwan; TERASHIMA MARIN, HUGO; 65879; Terashima Marín, Hugo; tolmquevedo, emipsanchez; Conant Pablos, Santiago Enrique; Escuela de Ingeniería y Ciencia; Campus Monterrey
People’s safety and security have always been an issue to attend. With the coming of techno- logical advances, part of it has been used to improve safeguards, though other aspects, without precautions, have made people even more vulnerable. People can get their sensitive data stolen or become victims of transaction fraud. These may be crimes done without physical interac- tion, but felonies with physical violence still exist. Some solutions for pedestrian safety are guards, police cars patrolling, sensors and security cameras. Nonetheless, these methods only react when the crime is happening or, even more critical, when it has already occurred, and the damage has been done. Therefore, numerous methods have been implemented using Arti- ficial Intelligence in order to solve this problem. Many approaches to detect violent behavior and action recognition rely on 3D convolutional neural networks (3D CNNs), spatial tempo- ral models, long short term memory networks, pose estimation among other implementations. However, in the current state of the art, how these approaches are used do not work perfectly and are not adapted to an uncontrolled environment. Therefore, a significant contribution from this work was the development of a new solu- tion model that is able to detect violent behavior. This approach focuses on using pedestrian detection, tracking, pose estimation and neural networks to predict pedestrian behavior in video frames. This method uses a time window frame to extract joint angles, given by the pose estimation algorithm, as features for classifying behavior. At the moment of developing this thesis project, there were not many databases with violent behavior videos. The ones that existed were low quality; cluttered were pedestrians cannot be seen clearly, and with unfixed camera angles. Consequently, another important contribution of this work was creating a new database, Kranok-NV, with a total of 3,683 normal and violent videos. This database was used to train and test the solution model. For the evaluation, a protocol was designed using 10-fold cross- validation. With the implemented solution model, accuracy of more than 98% was achieved on the Kranok-NV database. This approach surpassed the performance of state of the art methods for violence detection and action recognition in the developed database. Though this new solution model is able to detect violent and normal behavior, it can be easily extended to classify more types of behaviors. Further work requires to test this approach in emerging databases of videos and optimize specific areas of the solution model. Additionally, the contributions of this work can aid in the development of new approaches.

Ciencias Exactas y Ciencias de la Salud

Browse

Filters

Settings

Sort By

Results per page

Search Results