Tesis de maestría

Transformer-based hand landmark prediction from superficial electromyography

Loading...
Thumbnail Image

Citation

View formats

Share

Bibliographic managers

Abstract

The development of human-robot systems has become increasingly prominent in recent years, particularly in domains that require seamless and intuitive interactions between humans and machines, such as healthcare, manufacturing, rehabilitation, and entertainment. Within these fields, upper-limb robotics and prosthetics have experienced significant growth, where control strategies play a central role in user experience, functionality, and long-term usability. Among the available strategies, myoelectric control, which utilizes electrical activity gener- ated by muscle contractions to drive robotic actuators, stands out for its potential to provide direct and responsive user intent decoding. Despite its promise, current commercial myoelectric control systems suffer from no- table limitations. Most commercially available upper-limb robotic devices rely on binary or proportional control paradigms. Binary control allows the user to initiate simple, on/off com- mands (e.g., open or close a prosthetic hand). In contrast, a proportional control scales the degree of motion or force based on the magnitude of the input signal. While these methods are relatively straightforward to implement and train, they inherently limit the functionality of robotic systems by constraining them to a narrow range of discrete, non-adaptive actions. As a result, users often experience frustration due to unnatural movements, lack of fluidity, and the inability to perform complex, multi-joint, or continuous tasks. Pattern recognition (PR)-based control has emerged as a more advanced alternative to binary or proportional schemes. PR control systems employ machine learning algorithms to classify muscle activity into predefined gesture categories. This approach improves intuitive- ness by enabling the recognition of multiple movements and gestures, offering a more versatile control interface. However, PR control is also restricted in significant ways. Its effectiveness is typically bound by the limited number of gestures used during the training phase, making the system inflexible to untrained motions or novel hand configurations. Additionally, the reliance on discrete classification does not accommodate continuous, dynamic control, which is crucial for achieving truly natural and precise robotic movement. To address these limitations, we propose a novel method for predicting continuous hand movement using surface electromyography (sEMG) signals through the application of a mul- timodal transformer architecture. Unlike traditional PR systems that output gesture classes, our approach is designed to estimate continuous hand landmark positions, thereby enabling fluid and unrestricted movement trajectories. This method represents a paradigm shift in my- oelectric control, moving from classification-based strategies to continuous regression-based motion estimation. The proposed system employs a transformer model—a deep learning architecture orig- inally designed for natural language processing—that excels at capturing complex temporal and contextual relationships in sequential data. In the context of sEMG, transformer mod- els offer several key advantages over traditional convolutional or recurrent neural networks. First, transformers eliminate the need for handcrafted feature engineering, which has his- torically been a challenging and subjective component of EMG signal processing. Instead, the transformer architecture inherently learns relevant features from raw sEMG input data through self-attention mechanisms. Second, transformers can simultaneously model spatial and temporal dependencies within the input sequence. This is crucial for decoding sEMG signals, which exhibit both spatial complexity across different muscle groups and temporal dynamics associated with movement initiation and execution. Finally, transformer models are more parameter-efficient and scalable, making them adaptable to different limb configu- rations, electrode placements, and control environments. Our multimodal architecture takes sEMG signals as input and outputs the continuous positions of hand landmarks—key spatial reference points on the hand that define its posture and motion. By focusing on hand landmark prediction rather than gesture classification, the system bypasses the inherent limitations associated with a finite gesture vocabulary. This allows users to perform an unlimited range of movements, including intermediate postures and transitions between gestures, without retraining or expanding the model’s gesture set. The landmark-based output also facilitates integration with existing computer vision and robotic control systems, many of which use landmark-based representations for motion planning and kinematic modeling. The development and validation of our approach involved collecting synchronized sEMG and hand motion data from a cohort of participants performing a variety of hand movements. Hand landmarks were extracted using vision-based tracking systems, serving as ground truth labels for model training and evaluation. The transformer model was trained to map multi- channel sEMG signals to the corresponding hand landmark coordinates over time. Extensive experimentation demonstrated that our model not only outperformed baseline architectures in terms of accuracy and generalization but also required less training data due to the efficiency of the self-attention mechanism. Qualitative evaluations further confirmed that the predicted hand trajectories were smooth, natural, and closely aligned with actual user intent, indicating the system’s potential for real-time application in robotic prosthetics and exoskeletons. In summary, this study introduces a transformative approach to myoelectric control by leveraging a multimodal transformer architecture for continuous hand movement prediction. By shifting the focus from discrete gesture classification to continuous motion estimation via landmark regression, our method addresses several long-standing challenges in the field, including limited gesture scalability, unnatural control behavior, and reliance on handcrafted features. The use of transformers for modeling spatiotemporal dependencies in sEMG data represents a significant advancement, opening the door to more intuitive, responsive, and user- centric human-robot interaction systems.

Description

https://orcid.org/0009-0000-3999-4549

Collections

Loading...

logo

El usuario tiene la obligación de utilizar los servicios y contenidos proporcionados por la Universidad, en particular, los impresos y recursos electrónicos, de conformidad con la legislación vigente y los principios de buena fe y en general usos aceptados, sin contravenir con su realización el orden público, especialmente, en el caso en que, para el adecuado desempeño de su actividad, necesita reproducir, distribuir, comunicar y/o poner a disposición, fragmentos de obras impresas o susceptibles de estar en formato analógico o digital, ya sea en soporte papel o electrónico. Ley 23/2006, de 7 de julio, por la que se modifica el texto revisado de la Ley de Propiedad Intelectual, aprobado

Licencia