Ciencias Exactas y Ciencias de la Salud
Permanent URI for this collectionhttps://hdl.handle.net/11285/551039
Pertenecen a esta colección Tesis y Trabajos de grado de las Maestrías correspondientes a las Escuelas de Ingeniería y Ciencias así como a Medicina y Ciencias de la Salud.
Browse
Search Results
- Transformer-based hand landmark prediction from superficial electromyography(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2025-06) Ramos García deAlba, Diego Armando; Chairez Oria, Jorge Isaac; emipsanchez; Sánchez Ante, Gildardo; School of Engineering and Sciences; Campus Monterrey; Fuentes Aguilar, Rita QuetziquelThe development of human-robot systems has become increasingly prominent in recent years, particularly in domains that require seamless and intuitive interactions between humans and machines, such as healthcare, manufacturing, rehabilitation, and entertainment. Within these fields, upper-limb robotics and prosthetics have experienced significant growth, where control strategies play a central role in user experience, functionality, and long-term usability. Among the available strategies, myoelectric control, which utilizes electrical activity gener- ated by muscle contractions to drive robotic actuators, stands out for its potential to provide direct and responsive user intent decoding. Despite its promise, current commercial myoelectric control systems suffer from no- table limitations. Most commercially available upper-limb robotic devices rely on binary or proportional control paradigms. Binary control allows the user to initiate simple, on/off com- mands (e.g., open or close a prosthetic hand). In contrast, a proportional control scales the degree of motion or force based on the magnitude of the input signal. While these methods are relatively straightforward to implement and train, they inherently limit the functionality of robotic systems by constraining them to a narrow range of discrete, non-adaptive actions. As a result, users often experience frustration due to unnatural movements, lack of fluidity, and the inability to perform complex, multi-joint, or continuous tasks. Pattern recognition (PR)-based control has emerged as a more advanced alternative to binary or proportional schemes. PR control systems employ machine learning algorithms to classify muscle activity into predefined gesture categories. This approach improves intuitive- ness by enabling the recognition of multiple movements and gestures, offering a more versatile control interface. However, PR control is also restricted in significant ways. Its effectiveness is typically bound by the limited number of gestures used during the training phase, making the system inflexible to untrained motions or novel hand configurations. Additionally, the reliance on discrete classification does not accommodate continuous, dynamic control, which is crucial for achieving truly natural and precise robotic movement. To address these limitations, we propose a novel method for predicting continuous hand movement using surface electromyography (sEMG) signals through the application of a mul- timodal transformer architecture. Unlike traditional PR systems that output gesture classes, our approach is designed to estimate continuous hand landmark positions, thereby enabling fluid and unrestricted movement trajectories. This method represents a paradigm shift in my- oelectric control, moving from classification-based strategies to continuous regression-based motion estimation. The proposed system employs a transformer model—a deep learning architecture orig- inally designed for natural language processing—that excels at capturing complex temporal and contextual relationships in sequential data. In the context of sEMG, transformer mod- els offer several key advantages over traditional convolutional or recurrent neural networks. First, transformers eliminate the need for handcrafted feature engineering, which has his- torically been a challenging and subjective component of EMG signal processing. Instead, the transformer architecture inherently learns relevant features from raw sEMG input data through self-attention mechanisms. Second, transformers can simultaneously model spatial and temporal dependencies within the input sequence. This is crucial for decoding sEMG signals, which exhibit both spatial complexity across different muscle groups and temporal dynamics associated with movement initiation and execution. Finally, transformer models are more parameter-efficient and scalable, making them adaptable to different limb configu- rations, electrode placements, and control environments. Our multimodal architecture takes sEMG signals as input and outputs the continuous positions of hand landmarks—key spatial reference points on the hand that define its posture and motion. By focusing on hand landmark prediction rather than gesture classification, the system bypasses the inherent limitations associated with a finite gesture vocabulary. This allows users to perform an unlimited range of movements, including intermediate postures and transitions between gestures, without retraining or expanding the model’s gesture set. The landmark-based output also facilitates integration with existing computer vision and robotic control systems, many of which use landmark-based representations for motion planning and kinematic modeling. The development and validation of our approach involved collecting synchronized sEMG and hand motion data from a cohort of participants performing a variety of hand movements. Hand landmarks were extracted using vision-based tracking systems, serving as ground truth labels for model training and evaluation. The transformer model was trained to map multi- channel sEMG signals to the corresponding hand landmark coordinates over time. Extensive experimentation demonstrated that our model not only outperformed baseline architectures in terms of accuracy and generalization but also required less training data due to the efficiency of the self-attention mechanism. Qualitative evaluations further confirmed that the predicted hand trajectories were smooth, natural, and closely aligned with actual user intent, indicating the system’s potential for real-time application in robotic prosthetics and exoskeletons. In summary, this study introduces a transformative approach to myoelectric control by leveraging a multimodal transformer architecture for continuous hand movement prediction. By shifting the focus from discrete gesture classification to continuous motion estimation via landmark regression, our method addresses several long-standing challenges in the field, including limited gesture scalability, unnatural control behavior, and reliance on handcrafted features. The use of transformers for modeling spatiotemporal dependencies in sEMG data represents a significant advancement, opening the door to more intuitive, responsive, and user- centric human-robot interaction systems.
- Beyond images: convnext vs. vision-language models for automated breast density classification in screening mammography(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2025-06) Molina Román, Yusdivia; Santos Díaz, Alejandro; emipsanchez; Menasalvas Ruiz, Ernestina; Tamez Pena, José; Montesinos Silva, Luis Arturo; School of Engineering and Sciences; Campus Estado de MéxicoThis study evaluates and compares the effectiveness of different deep learning approaches for automated breast density classification according to the BI-RADS system. Specifically, the research examines two distinct architectures: ConvNeXt, a CNN-based model, and BioMed- CLIP, a vision-language model that integrates textual information through token-based labels. Using mammographic images from TecSalud at Tecnol´ogico de Monterrey, the study assesses these models across three distinct learning paradigms: zero-shot classification, linear probing with token-based descriptions, and fine-tuning with numerical class labels. The experimental results demonstrate that while vision-language models offer theoretical advantages in terms of interpretability and zero-shot capabilities, based CNN architectures with end-to-end fine-tuning currently deliver superior performance for this specialized medical imaging task. ConvNeXt achieves an accuracy of up to 0.71 and F1 scores of 0.67, compared to BioMedCLIP’s best performance of 0.57 accuracy with linear probing. A comprehensive analysis of classification patterns revealed that all models encountered difficulties in distinguishing between adjacent breast density categories, particularly heterogeneously dense tissue. This challenge mirrors known difficulties in clinical practice, where even experienced radiologists exhibit inter-observer variability in density assessment. The performance discrepancy between models was further examined through detailed loss curve analysis and confusion matrices, revealing specific strengths and limitations of each approach. A key limitation in BioMedCLIP’s performance stemmed from insufficient semantic richness in the textual tokens representing each density class. When category distinctions relied on subtle linguistic differences—such as ”extremely” versus ”heterogeneously”—the model struggled to form robust alignments between visual features and textual descriptions. The research contributes to the growing body of knowledge on AI applications in breast imaging by systematically comparing traditional and multimodal approaches under consistent experimental conditions. The findings highlight both the current limitations and future potential of vision-language models in mammographic analysis, suggesting that enhanced textual descriptions and domain-specific adaptations could potentially bridge the performance gap while preserving the interpretability benefits of multimodal approaches for clinical applications.
- PassID: A Modular System for Pass Detection with Integrated Player Identification in Football(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-12) Gutiérrez Padilla, Benjamín; Monroy Borja, Raúl; emimmayorquin; Gutiérrez Rodríguez, Andrés Eduardo; School of Engineering and Sciences; Campus Monterrey; Conant Pablos, Santiago EnriqueThe analysis of football passes plays a crucial role in understanding team tactics and improving performance. However, current methods for capturing and analyzing this data are often inaccessible due to high costs and reliance on proprietary datasets. This thesis presents the development of an automated system designed to detect passes in football matches using video as the source of information. The system integrates computer vision and machine learning techniques across mul tiple modules, including player and ball detection, object tracking, team identification, and pass detection. Using a hybrid approach with YOLOv9 for player detection, FasterRCNN for the ball, and Norfair for tracking, the system assigns unique identifiers to players and determines passes based on proximity and ball possession changes. Team identification is achieved through color histogram analysis, allowing the system to distinguish valid passes between players of the same team. The modular design enables independent improvements in each component, providing a flexible framework that can be adapted to different match conditions. This work represents a step forward in automating football pass detection, contributing to the growing field of sports analysis with a scalable and efficient solution.
- Maturity recognition and fruit counting for sweet peppers in greenhouses using deep Learning neural networks(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-01-05) Viveros Escamilla, Luis David; Gómez Espinosa, Alfonso; mtyahinojosa, emipsanchez; Cantoral Ceballos, José Antonio; Escuela de Ingenieria y Ciencias; Campus Querétaro; Escobedo Cabello, Jesús ArturoThis study presents an approach to address the challenges involved in recognizing the maturity stage and counting sweet peppers of varying colors (green, yellow, orange, and red) within greenhouse environments. The methodology leverages the YOLOv5 model for real-time object detection, classification, and localization, coupled with the DeepSORT algorithm for efficient tracking. The system was successfully implemented to monitor sweet pepper production, and some challenges related to this environment, namely occlusions and the presence of leaves and branches, were effectively overcome. The algorithm was evaluated using real-world data collected in a sweet pepper greenhouse. A dataset comprising 1863 images was meticulously compiled to enhance the study, incorporating diverse sweet pepper vari eties and maturity levels. Additionally, the study emphasized the role of confidence levels in object recognition, achieving a confidence level of 0.973. Furthermore, the DeepSORT algo rithm was successfully applied for counting sweet peppers, demonstrating an accuracy level of 85.7% in two simulated environments under challenging conditions, such as varied lighting and inaccuracies in maturity level assessment.
- Sign language recognition with tree structure skeleton images and densely connected convolutional neural networks(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2023-05) Laines Vázquez, David Alberto; González Mendoza, Miguel; puemcuervo; Sánchez Ante, Gildardo; Cantoral Ceballos, José Antonio; Méndez Vázquez, Andrés; School of Engineering and Sciences; Campus Monterrey; Ochoa Ruiz, GilbertoThis thesis presents a novel approach to Isolated Sign Language Recognition (ISLR) using skeleton modality data and deep learning. The study proposes a method that employs an image-based spatio-temporal skeleton representation for sign gestures and a convolu tional neural network (CNN) for classification. The advantages of the skeleton modality over RGB, such as reduced noise and smaller parameter requirements for processing, are taken into account. The aim is to achieve competitive performance with a low number of parameters compared to the existing state-of-the-art in ISLR. Informed by the literature on skeleton-based human action recognition (HAR), this research adapts the Tree Structure Skeleton Image (TSSI) method to represent a sign gesture as an image. The process in volves first extracting the skeleton sequences from sign videos using the MediaPipe frame work, which offers fast inference performance across multiple devices. The TSSI represen tation is then processed using a DenseNet, chosen for its efficiency and fewer parameters. The proposed method, called SL-TSSI-DenseNet, is trained and evaluated on two chal lenging datasets: the Word level American Sign Language (WLASL) dataset and the Ankara University Turkish Sign Language (AUTSL) dataset. Specifically, the WLASL-100 subset of the WLASL dataset and the RGB Track of the AUTSL dataset are selected for the experi ments. The results demonstrate that SL-TSSI-DenseNet outperforms other skeleton-based and RGB-based models benchmarked on the WLASL-100 dataset, achieving an accuracy of 81.47% through the use of data augmentation and pre-training. On the AUTSL dataset, it achieves competitive performance with an accuracy of 93.13% without pre-training and data augmentation. Additionally, an augmentation ablation study is conducted to iden tify the most effective data augmentation technique for the model’s performance on the WLASL-100 dataset. Furthermore, it provides insights into the effectiveness of various data augmentation techniques.
- Lights, camera, and domain shift: using superpixels for domain generalization in image segmentation for multimodal endoscopies(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2023-05) Martínez García Peña, Rafael; Ochoa Ruiz, Gilberto; puemcuervo, emipsanchez; Falcón Morales, Luis Eduardo; Gónzales Mendoza, Miguel; School of Engineering and Sciences; Campus Monterrey; Ali, SharibDeep Learning models have made great advancements in image processing. Their ability to identify key parts of images and provide fast and accurate segmentation has been proven and used in many fields, such as city navigation and object recognition. However, there is one field that is both in need of the extra information that computers can provide and has proven elusive for the goals of robustness and accuracy: Medicine. In the medical field, limitations in the amount of data and in the variation introduced by factors such as differences in instrumentation introduce a grave threat to the accuracy of a model known as domain shift. Domain shift occurs when we train with data that has a set of characteristics that is not wholly representative of the entire set of data a task encompasses. When it is present, models that have no tools to deal with it can observe a degradation to their accuracy to such degree that they can be transformed from usable to useless. To better explore this topic, we discuss two techniques: Domain adaptation, where we find how to make a model better at predicting for specific domain of data inside a task, and Domain generalization, where we find how to make a model better at predicting data for any domain inside a task. In addition, we discuss several image segmentation models that have shown good results for medical tasks: U-Net, Attention U-Net, DeepLab, Efficient U-Net, and EndoUDA. Following this exploration, we propose a solution model based on a domain generalization technique: Patch-based consistency. We use a superpixel generator known as SLIC (Simple Linear Iterative Clustering) to provide low-level, domain-agnostic information to different models in order to encourage our networks to learn more global features. This framework, which we refer to as SUPRA (SUPeRpixel Augmented), is used in tandem with U-Net, Attention U-Net, and Efficient U-Net in an effort to improve results in endoscopies where light modalities are switched: Something commonly seen in lesion detection tasks (particularly in Barrett's Esophagus and Polyp detection). We find that the best of these models, SUPRA-UNet, shows significant qualities that make it a better choice than unaugmented networks for lesion detection: Not only does it provide less noisy and smoother predictions, but it outperforms other networks by over 20% IoU versus the best results (U-Net) in a target domain that presents significant lighting differences from the training set.
- Advanced deep learning approaches for maritime trajectory prediction leveraging automatic identification system data(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2023) Familsamavati, Sajad; Hajiaghaei Keshteli, Mostafa; emiggomez, emimmayorquin; Guadalupe Villarreal Marroquín, María; School of Engineering and Sciences; Campus Monterrey; Smith Cornejo, Neale RicardoThis study investigates the efficacy of advanced DL models, specifically Bi-GRU, LSTM, and Bi-LSTM, for predicting maritime vessel trajectories using AIS data. The study focuses on doing comparative analysis of prediction accuracy in high-traffic maritime environments, particularly the Port of Manzanillo. Comprehensive AIS data preprocessing, feature engineering, and normalization were conducted to prepare the data for model training. The Bi-GRU model emerged as the most effective, demonstrating superior performance with the lowest test loss, MAE, and MSE, highlighting its capability in capturing sequential dependencies in vessel trajectories. The research contributes significantly to maritime traffic management by offering a predictive framework that enhances safety and efficiency in dynamic maritime operations. Future research directions include integrating additional data sources and extending model applications across various maritime regions.
- Component Detection based on Mask R CNN(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2023) Charles Garza, Daniel; Morales, Rubén; emimmayorquin; Vallejo Guevara, Antonio; Guedea Elizalde, Federico; Escuela de Ingeniería y Ciencias; Campus MonterreyThis thesis delves into the evolution and utilization of deep learning methodologies in the specific context of object detection and segmentation within the manufacturing industry. It thoroughly examines several state-of-the-art object detection techniques, including YOLO, RCNN, Fast R-CNN, etc. These methods are explored in detail, assessing their effectiveness and applicability in complex object identification and classification tasks. The study then focuses on Mask R-CNN, a method chosen for its outstanding performance in object segmentation and identification; especially, in cluttered and unstructured environments common in manufacturing settings.
- A novel dataset and deep learning method for automatic exposure correction in endoscopic imaging(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2022-12-01) García Vega, Carlos Axel; Falcón Morales, Luis Eduardo; puemcuervo, emipsanchez; Daul, Christian; González Mendoza, Miguel; Roshan Biswal, Rajesh; School of Engineering and Sciences; Campus Estado de México; Ochoa Ruiz, GilbertoEndoscopy is such an important medical practice that one of the most common type of cancer worldwide, cause of many deaths, can be diagnosed and treated since through this imaging technique clinicians can diagnose cancerous lesions in hollow organs. Nonetheless, endo- scopic images are often affected by sudden illumination changes which entail regions with overexposure, underexposure or even both errors, in accordance with the light source pose and the lumen texture of the inner walls. These poor light conditions can carry several negative consequences either for the examination itself or on the performance of Computed-assisted Diagnosis (CAD) or Computed-aided Surgery (CAS). However, almost no effort has been done for deploy endoscopic image enhancement methods that can perform adequately (even when both errors appear simultaneously) and in real-time. The contribution of the present work in overall aims to enhance the quality of Field-of-View (FoV) from endoscopic ex- aminations and Computed-assisted Diagnosis through real-time Deep Learning techniques, however, for achieving this general objective, we first built a reliable reference-based dataset Endo4IE, evaluates and validated by experts, to be an standard dataset for IE purposes, due to the lack of this dataset in the literature. Afterwards, we evaluated IE methods on our dataset to find out a prospect method for our case-of-study, in this case LMSPEC originally introduced to enhance images from natural scenes. We made adaptations over the objective function of the prospect method to obtain better performance regarding to structure and less artifacts in the enhanced frame. Finally, we tested on the Endo4IE dataseta and evaluate with state-of- the-art metrics against the baseline method, thus the proposed implementation has yielded a significant improvement over LMSPEC reaching a SSIM increase of 4.40% and 4.21% for overexposed and underexposed images, respectively. Regarding PSNR, an improvement of 3.83% for over-exposed and just 0.01% below with respect to LMSPEC.
- Reinforcement learning for an attitude control algorithm for racing quadcopters(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2022-06-15) Nakasone Nakamurakari, Shun Mauricio; BUSTAMANTE BELLO, MARTIN ROGELIO; 58810; Bustamante Bello, Martín Rogelio; puemcuervo; Navarro Durán, David; School of Engineering and Sciences; Campus Ciudad de México; Galuzzi Aguilera, RenatoFrom its first conception to its wide commercial distribution, Unmanned Aerial Vehicle (UAV)’s have always presented an interesting control problem as their dynamics are not as simple to model and present a non-linear behavior. These vehicles have improved as the technology in these devices has been developed reaching commercial and leisure use in everyday life. Out of the many applications for these vehicles, one that has been rising in popularity is drone racing. As technology improves, racing quadcopters have also improved reaching capabilities never seen before in flying vehicles. Though hardware and performance have improved throughout the drone racing industry, something that has been lacking, in a way, is better and more robust control algorithms. In this thesis, a new control strategy based on Reinforcment Learning (RL) is presented in order to achieve better performance in attitude control for racing quadcopters. For this process, two different plants were developed to fulfill, a) the training process needs with a simplified dynamics model and b) a higher fidelity Multibody model to validate the resulting controller. By using Proximal Policy Optimization (PPO), the agent is trained via a reward function and interaction with the environment. This dissertation presents a different approach on how to determine a reward function such that the agent trained learns in a more effective and faster way. The control algorithm obtained from the training process is simulated and tested against the most common attitude control algorithm used in drone races (Proportional Integral Derivative (PID) control), as well as its ability to reject noise in the state signals and external disturbances from the environment. Results from agents trained with and without these disturbances are also presented. The resulting control policies were comparable to the PID controller and even outperformed this control strategy in noise rejection and robustness to external disturbances.

