ANOSCAR: An image captioning model and dataset designed from OSCAR and the video dataset of activitynet

Byrd Suárez, Emmanuel; GONZALEZ MENDOZA, MIGUEL; 123361

ANOSCAR: An image captioning model and dataset designed from OSCAR and the video dataset of activitynet

dc.audience.educationlevel	Investigadores/Researchers	es_MX
dc.contributor.advisor	González Mendoza, Miguel
dc.contributor.author	Byrd Suárez, Emmanuel
dc.contributor.cataloger	puemcuervo	es_MX
dc.contributor.committeemember	Ochoa Ruiz, Gilberto
dc.contributor.committeemember	Marín Hernandez, Antonio
dc.contributor.department	School of Engineering and Sciences	es_MX
dc.contributor.institution	Campus Estado de México	es_MX
dc.contributor.mentor	Chang Fernández, Leonardo
dc.creator	GONZALEZ MENDOZA, MIGUEL; 123361
dc.date.accepted	2021-07-01
dc.date.accessioned	2023-04-26T17:36:14Z
dc.date.available	2023-04-26T17:36:14Z
dc.date.created	2021-05-16
dc.date.issued	2021-07-01
dc.description	https://orcid.org/0000-0001-6451-9109	es_MX
dc.description.abstract	Activity Recognition and Classification in video sequences is an area of research that has received attention recently. However, video processing is computationally expensive, and its advances have not been as extraordinary compared to those of Image Captioning. This work uses a computationally limited environment and learns an Image Captioning transformation of the ActivityNet-Captions Video Dataset that can be used for either Video Captioning or Video Storytelling. Different Data Augmentation techniques for Natural Language Processing are explored and applied to the generated dataset in an effort to increase its validation scores. Our proposal includes an Image Captioning dataset obtained from ActivityNet with its features generated by Bottom-Up attention and a model to predict its captions, generated with OSCAR. Our captioning scores are slightly better than those of S2VT, but with a much simpler pipeline, showing a starting point for future research using our approach, which can be used for either Video Captioning or Video Storytelling. Finally, we propose different lines of research to how this work can be further expanded and improved.	es_MX
dc.description.degree	Master of Science in Computer Science	es_MX
dc.format.medium	Texto	es_MX
dc.identificator	7\|\|33\|\|3304\|\|120323	es_MX
dc.identifier.citation	Byrd Suárez, E.(2021). ANOSCAR: An image captioning model and dataset designed from OSCAR and the video dataset of activitynet [Unpublished master's thesis]. Instituto Tecnológico de Estudios Superiores de Monterrey.	es_MX
dc.identifier.orcid	https://orcid.org/0000-0002-9614-8944	es_MX
dc.identifier.uri	https://hdl.handle.net/11285/650436
dc.language.iso	eng	es_MX
dc.publisher	Instituto Tecnológico y de Estudios Superiores de Monterrey	es_MX
dc.relation	CONACYT	es_MX
dc.relation.isFormatOf	draft	es_MX
dc.relation.isreferencedby	REPOSITORIO NACIONAL CONACYT
dc.rights	openAccess	es_MX
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0	es_MX
dc.subject.classification	INGENIERÍA Y TECNOLOGÍA::CIENCIAS TECNOLÓGICAS::TECNOLOGÍA DE LOS ORDENADORES::LENGUAJES DE PROGRAMACIÓN	es_MX
dc.subject.keyword	Video captioning	es_MX
dc.subject.keyword	Image captioning	es_MX
dc.subject.keyword	Activity recognition	es_MX
dc.subject.keyword	Computer science	es_MX
dc.subject.keyword	Deep learning	es_MX
dc.subject.keyword	Computer vision	es_MX
dc.subject.keyword	Artificial neural networks	es_MX
dc.subject.keyword	Soft computing	es_MX
dc.subject.lcsh	Science	es_MX
dc.title	ANOSCAR: An image captioning model and dataset designed from OSCAR and the video dataset of activitynet	es_MX
dc.type	Tesis de maestría