Zareei, MahdiBurgueño Paz, Luis Humberto2025-01-042024-11-24Burgueno Paz, L. H. (2024). The role of capitalization and character repetition in identifying depression on social Media: a bilingual approach [Tesis maestría]. Instituto Tecnológico y de Estudios Superiores de Monterrey. Recuperado de: https://hdl.handle.net/11285/702964https://hdl.handle.net/11285/702964https://doi.org/10.60473/ritec.40https://orcid.org/0000-0001-6623-1758Depression is a mental disorder that affects millions of people worldwide, but a significant portion of the affected people don’t receive adequate treatment. There has been an increasing interest from researchers to detect this condition through social media posts in order to prompt for early treatment. However, most of the research has been focused on the Caucasian Western English-speaking population, limiting the applicability of their findings across diverse cultural contexts. While research has shown the use of nonverbal cues to convey sentiment, their role on depression detection remains under-explored. This thesis aims to assess the effect of nonverbal cues, specifically capitalization and character repetition, on depression detection using datasets both in English and Spanish. This effect was explored through three existing datasets. The first dataset included a collection of Reddit posts and comments in the English language and was selected to assess the effect on a dataset coming from one of the most reputable mental health competitions in Natural Language Processing. The second dataset consisted of a collection of Spanish- language messages from Telegram to verify whether findings in the English language would hold for Spanish. The third dataset, also built from Reddit posts, was used to analyze the impact of these features when classifying by depression severity levels rather than binary labels. Four classifiers were used throughout this research: Logistic Regression, Random Forest, Support Vector Machine, and Neural Network. Overall, the impact of capitalization and character repetition for depression detection was found to be minimal. These features had the most effect on English Reddit data with binary labels, while showing limited impact on Spanish data or when classifying by severity levels. Additionally, models using only character repetition outperformed those relying on capitalization features.TextoengopenAccesshttp://creativecommons.org/licenses/by-nc-nd/4.0INGENIERÍA Y TECNOLOGÍA::CIENCIAS TECNOLÓGICAS::OTRAS ESPECIALIDADES TECNOLÓGICAS::OTRASTechnologyThe role of capitalization and character repetition in identifying depression on social Media: a bilingual approachTesis de Maestría / master Thesishttps://orcid.org/0009-0005-1531-3872DepressionDetectionSocial mediaMental healthMachine learning1276308