Ciencias Exactas y Ciencias de la Salud

Permanent URI for this collectionhttps://hdl.handle.net/11285/551039

Pertenecen a esta colección Tesis y Trabajos de grado de las Maestrías correspondientes a las Escuelas de Ingeniería y Ciencias así como a Medicina y Ciencias de la Salud.

Browse

Search Results

Now showing 1 - 7 of 7

Analysis of the architecture of a remote-controlled vehicle and automatic label localization
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2025-12-03) Reyes Lizárraga, Omar Ernesto; Ahuett Garza, Horacio; emimmayorquin, emipsanchez; Castañeda Cuevas, Herman; Urbina Coronado, Pedro Daniel; Escuela de Ingeniería y Ciencias; Campus Monterrey
This thesis presents the design and evaluation of a cost-constrained, remote-controlled vehicle architecture for automatic label localization and inventory registration in outdoor metallic yards. The proposed system combines high-resolution vision, RFID, GPS and ultrasonic sensing on a rocker-bogie mobile platform, using vision as the primary modality to detect and read deteriorated labels while RFID acts as an assistive channel to confirm presence and recover IDs when visual information is incomplete. A three-stage methodology is followed: (i) characterization of a baseline teleoperated system, (ii) redesign of sensing, power, and logging to obtain reliable multimodal records, and (iii) implementation of a semi-autonomous detection-search-tracking-registration loop coordinated by a lightweight state machine. Across these stages, the architecture evolves from manual data capture with fragmented logs to a unified registration model in which each event bundles image, EPC/ID, GPS-based yard zone and quality indicators, locally stored and streamed to a cloud dashboard via MQTT for real-time supervision. Field tests in realistic yard conditions show that the final system can reduce typical time-to-register per label, increase data completeness and GPS-based localization quality, and lower operator workload by shifting effort from manual scanning to high-level supervision. The results demonstrate that a low-cost, vision-first, RFID-assisted vehicle can significantly improve traceability and safety in outdoor inventory operations.
Development of mobile crowd sensing based models for fire risk assessments in constrained devices
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2025-06) Low Castro, Jesús Antonio; Rodríguez Hernandez, Gerardo; emimmayorquin; Gonzalez Mendoza, Miguel; Sanchez Ante, Gildardo; School of Engineering and Sciences; Campus Monterrey; Ochoa Ruiz, Gilberto
Wildfires have become one of the most critical challenges to address due to their increasing frequency as a result of climate change, causing significant damage to ecosystems, lives, and property. Although various strategies have been explored for wildfire management, a promising approach focuses on wildfire risk assessment through fuel identification, where fuels are sources of stored potential energy that combust under specific environmental and physical conditions. Since fuels are key determinants of fire behavior, identifying fire-prone areas inadvance can help reduce the severity, spread, and intensity of wildfires. Traditional fuel mapping techniques are commonly used for this purpose and rely primarily on satellite and aerial imagery, but face limitations in resolution, cost, and real-time accessibility, highlighting the need for complementary ground-based systems. This thesis explores a wildfire risk assessment approach based on ground-level fuel identification using computer vision models deployed on resource-constrained devices, specifically smartphones. To enable distributed data collection and inference, a mobile crowdsensing scheme is proposed. The methodology includes training, quantization, and deployment of object detection and semantic segmentation models for fuel identification on mobile devices. The research includes case studies on optimized object detection using the Edmonton Wildland-Urban Interface dataset, the deployment of lightweight semantic segmentation models using a custom dataset from the Arteaga Mountain Range in Mexico, and a semisupervised labeling strategy that uses a robust semantic segmentation model to augment training data. The results demonstrate that the proposed models achieve high accuracy while meeting the computational and storage constraints of mobile devices, supporting the feasibility of using mobile crowd-sensing and optimized vision models for a low-cost real-time assessment of wildfire risk.
Identification of species of plants of the Solanum (Solanaceae) genus native to Mexico using computational vision and convolutional neural networks on pictures of herbarium specimens
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2025-02) Hernández Rincón, Carlos Eduardo; Falcón Morales, Luis Eduardo; emipsanchez; Rodríguez Contreras, Aarón; Mendoza Montoya, Omar; Escuela de Ingeniería y Ciencias EIC; Campus Guadalajara
The development of Deep Learning techniques like Convolutional Neural Networks for automated image processing has been making big strides in recent years. This has helped to find more practical applications in many science fields. One such field is that of botanic taxonomic analysis which aims to accurately identify and classify new species of plants. It is important not only for scientific purposes but also for taking appropriate conservation actions, for economic reasons and for proper environment policy making. However, doing this requires a lot of technical skills and time and the number of qualified people at herbaria and scientific institutions in Mexico is not enough. Moreover, a significant number of new plant species have already been collected but are sitting unidentified in herbaria across the country. The Solanum genus encompasses species such as potatoes, eggplants and tomatoes. It is one of the most diverse and important for its economic, nutritious and cultural value worldwide. Mexico is no exception, and it is home to many species both discovered and undiscovered. Currently there is a project at Universidad de Guadalajara to identify all species of the Solanum genus native to Mexico that have already been collected at different herbaria. Convolutional Neural Networks could help with this huge task. The main purpose of this research is to prove that a system to assist a human taxonomist identify these plants is feasible and indeed helpful.
Maturity recognition and fruit counting for sweet peppers in greenhouses using deep Learning neural networks
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-01-05) Viveros Escamilla, Luis David; Gómez Espinosa, Alfonso; mtyahinojosa, emipsanchez; Cantoral Ceballos, José Antonio; Escuela de Ingenieria y Ciencias; Campus Querétaro; Escobedo Cabello, Jesús Arturo
This study presents an approach to address the challenges involved in recognizing the maturity stage and counting sweet peppers of varying colors (green, yellow, orange, and red) within greenhouse environments. The methodology leverages the YOLOv5 model for real-time object detection, classification, and localization, coupled with the DeepSORT algorithm for efficient tracking. The system was successfully implemented to monitor sweet pepper production, and some challenges related to this environment, namely occlusions and the presence of leaves and branches, were effectively overcome. The algorithm was evaluated using real-world data collected in a sweet pepper greenhouse. A dataset comprising 1863 images was meticulously compiled to enhance the study, incorporating diverse sweet pepper vari eties and maturity levels. Additionally, the study emphasized the role of confidence levels in object recognition, achieving a confidence level of 0.973. Furthermore, the DeepSORT algo rithm was successfully applied for counting sweet peppers, demonstrating an accuracy level of 85.7% in two simulated environments under challenging conditions, such as varied lighting and inaccuracies in maturity level assessment.
Sign language recognition with tree structure skeleton images and densely connected convolutional neural networks
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2023-05) Laines Vázquez, David Alberto; González Mendoza, Miguel; puemcuervo; Sánchez Ante, Gildardo; Cantoral Ceballos, José Antonio; Méndez Vázquez, Andrés; School of Engineering and Sciences; Campus Monterrey; Ochoa Ruiz, Gilberto
This thesis presents a novel approach to Isolated Sign Language Recognition (ISLR) using skeleton modality data and deep learning. The study proposes a method that employs an image-based spatio-temporal skeleton representation for sign gestures and a convolu tional neural network (CNN) for classification. The advantages of the skeleton modality over RGB, such as reduced noise and smaller parameter requirements for processing, are taken into account. The aim is to achieve competitive performance with a low number of parameters compared to the existing state-of-the-art in ISLR. Informed by the literature on skeleton-based human action recognition (HAR), this research adapts the Tree Structure Skeleton Image (TSSI) method to represent a sign gesture as an image. The process in volves first extracting the skeleton sequences from sign videos using the MediaPipe frame work, which offers fast inference performance across multiple devices. The TSSI represen tation is then processed using a DenseNet, chosen for its efficiency and fewer parameters. The proposed method, called SL-TSSI-DenseNet, is trained and evaluated on two chal lenging datasets: the Word level American Sign Language (WLASL) dataset and the Ankara University Turkish Sign Language (AUTSL) dataset. Specifically, the WLASL-100 subset of the WLASL dataset and the RGB Track of the AUTSL dataset are selected for the experi ments. The results demonstrate that SL-TSSI-DenseNet outperforms other skeleton-based and RGB-based models benchmarked on the WLASL-100 dataset, achieving an accuracy of 81.47% through the use of data augmentation and pre-training. On the AUTSL dataset, it achieves competitive performance with an accuracy of 93.13% without pre-training and data augmentation. Additionally, an augmentation ablation study is conducted to iden tify the most effective data augmentation technique for the model’s performance on the WLASL-100 dataset. Furthermore, it provides insights into the effectiveness of various data augmentation techniques.
Lights, camera, and domain shift: using superpixels for domain generalization in image segmentation for multimodal endoscopies
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2023-05) Martínez García Peña, Rafael; Ochoa Ruiz, Gilberto; puemcuervo, emipsanchez; Falcón Morales, Luis Eduardo; Gónzales Mendoza, Miguel; School of Engineering and Sciences; Campus Monterrey; Ali, Sharib
Deep Learning models have made great advancements in image processing. Their ability to identify key parts of images and provide fast and accurate segmentation has been proven and used in many fields, such as city navigation and object recognition. However, there is one field that is both in need of the extra information that computers can provide and has proven elusive for the goals of robustness and accuracy: Medicine. In the medical field, limitations in the amount of data and in the variation introduced by factors such as differences in instrumentation introduce a grave threat to the accuracy of a model known as domain shift. Domain shift occurs when we train with data that has a set of characteristics that is not wholly representative of the entire set of data a task encompasses. When it is present, models that have no tools to deal with it can observe a degradation to their accuracy to such degree that they can be transformed from usable to useless. To better explore this topic, we discuss two techniques: Domain adaptation, where we find how to make a model better at predicting for specific domain of data inside a task, and Domain generalization, where we find how to make a model better at predicting data for any domain inside a task. In addition, we discuss several image segmentation models that have shown good results for medical tasks: U-Net, Attention U-Net, DeepLab, Efficient U-Net, and EndoUDA. Following this exploration, we propose a solution model based on a domain generalization technique: Patch-based consistency. We use a superpixel generator known as SLIC (Simple Linear Iterative Clustering) to provide low-level, domain-agnostic information to different models in order to encourage our networks to learn more global features. This framework, which we refer to as SUPRA (SUPeRpixel Augmented), is used in tandem with U-Net, Attention U-Net, and Efficient U-Net in an effort to improve results in endoscopies where light modalities are switched: Something commonly seen in lesion detection tasks (particularly in Barrett's Esophagus and Polyp detection). We find that the best of these models, SUPRA-UNet, shows significant qualities that make it a better choice than unaugmented networks for lesion detection: Not only does it provide less noisy and smoother predictions, but it outperforms other networks by over 20% IoU versus the best results (U-Net) in a target domain that presents significant lighting differences from the training set.
ANOSCAR: An image captioning model and dataset designed from OSCAR and the video dataset of activitynet
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2021-07-01) Byrd Suárez, Emmanuel; GONZALEZ MENDOZA, MIGUEL; 123361; González Mendoza, Miguel; puemcuervo; Ochoa Ruiz, Gilberto; Marín Hernandez, Antonio; School of Engineering and Sciences; Campus Estado de México; Chang Fernández, Leonardo
Activity Recognition and Classification in video sequences is an area of research that has received attention recently. However, video processing is computationally expensive, and its advances have not been as extraordinary compared to those of Image Captioning. This work uses a computationally limited environment and learns an Image Captioning transformation of the ActivityNet-Captions Video Dataset that can be used for either Video Captioning or Video Storytelling. Different Data Augmentation techniques for Natural Language Processing are explored and applied to the generated dataset in an effort to increase its validation scores. Our proposal includes an Image Captioning dataset obtained from ActivityNet with its features generated by Bottom-Up attention and a model to predict its captions, generated with OSCAR. Our captioning scores are slightly better than those of S2VT, but with a much simpler pipeline, showing a starting point for future research using our approach, which can be used for either Video Captioning or Video Storytelling. Finally, we propose different lines of research to how this work can be further expanded and improved.

Ciencias Exactas y Ciencias de la Salud

Browse

Filters

Settings

Sort By

Results per page

Search Results