Ciencias Exactas y Ciencias de la Salud
Permanent URI for this collectionhttps://hdl.handle.net/11285/551039
Pertenecen a esta colección Tesis y Trabajos de grado de las Maestrías correspondientes a las Escuelas de Ingeniería y Ciencias así como a Medicina y Ciencias de la Salud.
Browse
Search Results
- Sign language recognition with tree structure skeleton images and densely connected convolutional neural networks(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2023-05) Laines Vázquez, David Alberto; González Mendoza, Miguel; puemcuervo; Sánchez Ante, Gildardo; Cantoral Ceballos, José Antonio; Méndez Vázquez, Andrés; School of Engineering and Sciences; Campus Monterrey; Ochoa Ruiz, GilbertoThis thesis presents a novel approach to Isolated Sign Language Recognition (ISLR) using skeleton modality data and deep learning. The study proposes a method that employs an image-based spatio-temporal skeleton representation for sign gestures and a convolu tional neural network (CNN) for classification. The advantages of the skeleton modality over RGB, such as reduced noise and smaller parameter requirements for processing, are taken into account. The aim is to achieve competitive performance with a low number of parameters compared to the existing state-of-the-art in ISLR. Informed by the literature on skeleton-based human action recognition (HAR), this research adapts the Tree Structure Skeleton Image (TSSI) method to represent a sign gesture as an image. The process in volves first extracting the skeleton sequences from sign videos using the MediaPipe frame work, which offers fast inference performance across multiple devices. The TSSI represen tation is then processed using a DenseNet, chosen for its efficiency and fewer parameters. The proposed method, called SL-TSSI-DenseNet, is trained and evaluated on two chal lenging datasets: the Word level American Sign Language (WLASL) dataset and the Ankara University Turkish Sign Language (AUTSL) dataset. Specifically, the WLASL-100 subset of the WLASL dataset and the RGB Track of the AUTSL dataset are selected for the experi ments. The results demonstrate that SL-TSSI-DenseNet outperforms other skeleton-based and RGB-based models benchmarked on the WLASL-100 dataset, achieving an accuracy of 81.47% through the use of data augmentation and pre-training. On the AUTSL dataset, it achieves competitive performance with an accuracy of 93.13% without pre-training and data augmentation. Additionally, an augmentation ablation study is conducted to iden tify the most effective data augmentation technique for the model’s performance on the WLASL-100 dataset. Furthermore, it provides insights into the effectiveness of various data augmentation techniques.
- ANOSCAR: An image captioning model and dataset designed from OSCAR and the video dataset of activitynet(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2021-07-01) Byrd Suárez, Emmanuel; GONZALEZ MENDOZA, MIGUEL; 123361; González Mendoza, Miguel; puemcuervo; Ochoa Ruiz, Gilberto; Marín Hernandez, Antonio; School of Engineering and Sciences; Campus Estado de México; Chang Fernández, LeonardoActivity Recognition and Classification in video sequences is an area of research that has received attention recently. However, video processing is computationally expensive, and its advances have not been as extraordinary compared to those of Image Captioning. This work uses a computationally limited environment and learns an Image Captioning transformation of the ActivityNet-Captions Video Dataset that can be used for either Video Captioning or Video Storytelling. Different Data Augmentation techniques for Natural Language Processing are explored and applied to the generated dataset in an effort to increase its validation scores. Our proposal includes an Image Captioning dataset obtained from ActivityNet with its features generated by Bottom-Up attention and a model to predict its captions, generated with OSCAR. Our captioning scores are slightly better than those of S2VT, but with a much simpler pipeline, showing a starting point for future research using our approach, which can be used for either Video Captioning or Video Storytelling. Finally, we propose different lines of research to how this work can be further expanded and improved.

