Learning temporal features of facial action units using deep learning

dc.audience.educationlevelInvestigadores/Researcherses_MX
dc.contributor.advisorConant Pablos, Santiago Enrique
dc.contributor.institutionCampus Monterreyes_MX
dc.contributor.institutionCampus Monterreyes_MX
dc.contributor.institutionCampus Monterreyes_MX
dc.creatorSánchez Pámanes, Roberto
dc.date.accessioned2019-08-29T23:20:32Z
dc.date.available2019-08-29T23:20:32Z
dc.date.created2019-05
dc.description.abstractFacial expressions are an important aspect of human life and research on this topic has led to real-world technological applications. The task of recognizing facial states is involved in a collection of challenging tasks that include assisting elders and babies, as well as enhancing pedagogical exercises. Unlike categorizing faces into emotions, the Facial Action Coding System encode ambiguous expressions by analyzing small differences in the face based on muscle movements called action units. By analyzing action unit co-occurrences, human coders can virtually create any anatomically possible facial scenario that is independent of interpretation and can be used as a tool for higher-level decision processes. The automatic detection of action units in videos has recently become an interesting topic for the deep learning community since models of this area have dramatically improved the performance in image-related tasks. The state-of-the-art proposals in the benchmark database FERA17 are currently vanilla implementations of convolutional neural networks that model the occurrence of action units by ignoring their \emph{temporal features}. However, rather than being like a single snapshot, the occurrence of independent facial movements changes over time in response to information dynamically gathered from the environment, thus these deep models cannot completely capture the complex dynamic context involved in their occurrence. Researchers have engineered other deep learning methods that possess the ability to learn features across sequences of images. These procedures can be grouped into three categories, 1) methods that extend image-based architectures by using aggregation methods, or 2) recurrent units, and 3) methods that are able to process spatiotemporal features natively. They all offer the possibility of capturing AU dynamics and enhance their detection. However, their study has been frequently overlooked by the facial expression recognition community, particularly for AU occurrence detection, and up to these days, it is unclear whether deep learning models that incorporate temporal features can indeed outperform those who do not. This work analyzes the effects of incrementally adding temporal capabilities to the spatial model ResNet50 on predicting the occurrence of a single action unit of the FERA17 database. Configurations evaluated include inflating the kernels in the model to create a 3-dimensional version of ResNet50, adding a recurrent layer to encode long-term dependencies, and including the dense optical flow representation of two consecutive periods of time. Results show that adding recurrent units to a spatial model out-performs other temporal paradigms and the baseline ResNet50 by 7.4\% considering the $F_1$ score. The discoveries placed in this thesis can be utilized to better define deep learning initial implementations for projects related to facial expression recognition. Knowing the extent to which each temporal paradigm can effectively capture the dynamics inherent to AU occurrence, future research projects can be improved.es_MX
dc.description.degreeMaster of Science in Computer Sciencees_MX
dc.format.mediumTextoes_MX
dc.identifier.citationSánchez-Pámanes, R. (2019). Learning temporal features of facial action units using deeplearning (Master's Thesis). Tecnologico de Monterrey.es_MX
dc.identifier.urihttp://hdl.handle.net/11285/633039
dc.publisherInstituto Tecnológico y de Estudios Superiores de Monterreyesp
dc.publisher.institutionInstituto Tecnológico y de Estudios Superiores de Monterreyes_MX
dc.relation.impreso2019-05-21
dc.relation.isFormatOfversión publicadaes_MX
dc.rightsOpen Accesses_MX
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/*
dc.subjectINGENIERÍA Y TECNOLOGÍAes_MX
dc.subjectINGENIERÍA Y TECNOLOGÍA::CIENCIAS TECNOLÓGICAS::TECNOLOGÍA DE LOS ORDENADORESes_MX
dc.subject.keyworddeep learninges_MX
dc.subject.keywordfacses_MX
dc.subject.lcshIngeniería y Ciencias Aplicadas / Engineering & Applied Scienceses_MX
dc.titleLearning temporal features of facial action units using deep learninges_MX
dc.typeTesis de maestría

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
Thesis-3-Firmas.pdf
Size:
209.53 KB
Format:
Adobe Portable Document Format
Description:
Firmas
Loading...
Thumbnail Image
Name:
autorizacion_tesis.pdf
Size:
127.55 KB
Format:
Adobe Portable Document Format
Description:
Carta de Autorización
Loading...
Thumbnail Image
Name:
Tesis_sin_firmas.pdf
Size:
4.64 MB
Format:
Adobe Portable Document Format
Description:
Tesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.3 KB
Format:
Item-specific license agreed upon to submission
Description:
logo

El usuario tiene la obligación de utilizar los servicios y contenidos proporcionados por la Universidad, en particular, los impresos y recursos electrónicos, de conformidad con la legislación vigente y los principios de buena fe y en general usos aceptados, sin contravenir con su realización el orden público, especialmente, en el caso en que, para el adecuado desempeño de su actividad, necesita reproducir, distribuir, comunicar y/o poner a disposición, fragmentos de obras impresas o susceptibles de estar en formato analógico o digital, ya sea en soporte papel o electrónico. Ley 23/2006, de 7 de julio, por la que se modifica el texto revisado de la Ley de Propiedad Intelectual, aprobado

DSpace software copyright © 2002-2026

Licencia