Tesis de maestría / master thesis

Sign language recognition with tree structure skeleton images and densely connected convolutional neural networks

Loading...
Thumbnail Image

Citation

View formats

Share

Bibliographic managers

Abstract

This thesis presents a novel approach to Isolated Sign Language Recognition (ISLR) using skeleton modality data and deep learning. The study proposes a method that employs an image-based spatio-temporal skeleton representation for sign gestures and a convolu tional neural network (CNN) for classification. The advantages of the skeleton modality over RGB, such as reduced noise and smaller parameter requirements for processing, are taken into account. The aim is to achieve competitive performance with a low number of parameters compared to the existing state-of-the-art in ISLR. Informed by the literature on skeleton-based human action recognition (HAR), this research adapts the Tree Structure Skeleton Image (TSSI) method to represent a sign gesture as an image. The process in volves first extracting the skeleton sequences from sign videos using the MediaPipe frame work, which offers fast inference performance across multiple devices. The TSSI represen tation is then processed using a DenseNet, chosen for its efficiency and fewer parameters. The proposed method, called SL-TSSI-DenseNet, is trained and evaluated on two chal lenging datasets: the Word level American Sign Language (WLASL) dataset and the Ankara University Turkish Sign Language (AUTSL) dataset. Specifically, the WLASL-100 subset of the WLASL dataset and the RGB Track of the AUTSL dataset are selected for the experi ments. The results demonstrate that SL-TSSI-DenseNet outperforms other skeleton-based and RGB-based models benchmarked on the WLASL-100 dataset, achieving an accuracy of 81.47% through the use of data augmentation and pre-training. On the AUTSL dataset, it achieves competitive performance with an accuracy of 93.13% without pre-training and data augmentation. Additionally, an augmentation ablation study is conducted to iden tify the most effective data augmentation technique for the model’s performance on the WLASL-100 dataset. Furthermore, it provides insights into the effectiveness of various data augmentation techniques.

Collections

Loading...

Document viewer

Select a file to preview:
Reload

logo

El usuario tiene la obligación de utilizar los servicios y contenidos proporcionados por la Universidad, en particular, los impresos y recursos electrónicos, de conformidad con la legislación vigente y los principios de buena fe y en general usos aceptados, sin contravenir con su realización el orden público, especialmente, en el caso en que, para el adecuado desempeño de su actividad, necesita reproducir, distribuir, comunicar y/o poner a disposición, fragmentos de obras impresas o susceptibles de estar en formato analógico o digital, ya sea en soporte papel o electrónico. Ley 23/2006, de 7 de julio, por la que se modifica el texto revisado de la Ley de Propiedad Intelectual, aprobado

Licencia