Machine translation for suicide detection: validating spanish datasetsusing machine and deep learning models

dc.audience.educationlevelEmpresas/Companies
dc.contributor.advisorZareel, Mahdi
dc.contributor.authorArenas Enciso, Francisco Ariel
dc.contributor.catalogeremipsanchez
dc.contributor.committeememberGarcía Ceja, Enrique Alejandro
dc.contributor.committeememberRoshan Biswal, Rajesh
dc.contributor.departmentSchool of Engineering and Sciences
dc.contributor.institutionSede EGADE Monterrey
dc.date.accepted2024-12-02
dc.date.accessioned2024-12-31T03:48:58Z
dc.date.issued2024-11
dc.description.abstractSuicide is a complex health concern that affects not only individuals but society as a whole. The application of traditional strategies to prevent, assess, and treat this condition has proven inefficient in a modern world in which interactions are mainly made online. Thus, in recent years, multidisciplinary efforts have explored how computational techniques could be applied to automatically detect individuals who desire to end their lives on textual input. Such methodologies rely on two main technical approaches: text-based classification and deep learning. Further, these methods rely on datasets labeled with relevant information, often sourced from clinically-curated social media posts or healthcare records, and more recently, public social media data has proven especially valuable for this purpose. Nonetheless, research focused on the application of computational algorithms for detecting suicide or its ideation is still an emerging field of study. In particular, investigations on this topic have recently considered specific factors, like language or socio-cultural contexts, that affect the causality, rationality, and intentionality of an individual’s manifestation, to improve the assessment made on textual data. Consequently, problems like the lack of data in non-Anglo-Saxon contexts capable of exploiting computational techniques for detecting suicidal ideation are still a pending endeavor. Thus, this thesis addresses the limited availability of suicide ideation datasets in non-Anglo-Saxon contexts, particularly for Spanish, despite its global significance as a widely spoken language. The research hypothesizes that Machine- Translated Spanish datasets can yield comparable results (within a ±5% performance range) to English datasets when training machine learning and deep learning models for suicide ideation detection. To test this, multiple machine translation models were evaluated, and the two most optimal models were selected to translate an English dataset of social media posts into Spanish. The English and translated Spanish datasets were then processed through a binary classification task using SVM, Logistic Regression, CNN, and LSTM models. Results demonstrated that the translated Spanish datasets achieved scores in performance metrics close to the original English set across all classifiers, with limited variations in accuracy, precision, recall, F1-score, ROC AUC, and MCC metrics remaining within the hypothesized ±5% range. For example, the SVM classifier on the translated Spanish sets achieved an accuracy of 90%, closely matching the 91% achieved on the original English set. These findings confirm that machine-translated datasets can serve as effective resources for training ML and DL models for suicide ideation detection in Spanish, thereby supporting the viability of extending suicide detection models to non-English-speaking populations. This contribution provides a methodological foundation for expanding suicide prevention tools to diverse linguistic and cultural contexts, potentially benefiting health organizations and academic institutions interested in psychological computation.
dc.description.degreeMaster of Science in Computer Science
dc.format.mediumTexto
dc.identificator339999
dc.identifier.citationArenas Enciso, F. A. (2024). Machine translation for suicide detection: validating spanish datasetsusing machine and deep learning models [Tesis maestría]. Instituto Tecnológico y de Estudios Superiores de Monterrey. Recuperado de: https://hdl.handle.net/11285/702957
dc.identifier.cvu1276209
dc.identifier.urihttps://hdl.handle.net/11285/702957
dc.identifier.urihttps://doi.org/10.60473/ritec.33
dc.language.isoeng
dc.publisherInstituto Tecnológico y de Estudios Superiores de Monterrey
dc.relationInstituto Tecnológico y de Estudios Superiores de Monterrey
dc.relationCONAHCYT
dc.relation.isFormatOfacceptedVersion
dc.rightsopenAccess
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0
dc.subject.classificationINGENIERÍA Y TECNOLOGÍA::CIENCIAS TECNOLÓGICAS::OTRAS ESPECIALIDADES TECNOLÓGICAS::OTRAS
dc.subject.keywordSuicidal Ideation
dc.subject.keywordMachine Learning
dc.subject.keywordDeep Learning
dc.subject.keywordMental Health
dc.subject.keywordSuicide
dc.subject.keywordMachine Translation
dc.subject.keywordBinary Classification
dc.subject.lcshTechnology
dc.subject.lcshScience
dc.titleMachine translation for suicide detection: validating spanish datasetsusing machine and deep learning models
dc.typeTesis de Maestría / master Thesis

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
ArenasEnciso_TesisMaestriapdfa.pdf
Size:
7.58 MB
Format:
Adobe Portable Document Format
Description:
Tesis Maestría
Loading...
Thumbnail Image
Name:
ArenasEnciso_ActaGradoDeclaracionAutoriapdfa.pdf
Size:
387.3 KB
Format:
Adobe Portable Document Format
Description:
Acta de Grado y Declaración de Autoría
Loading...
Thumbnail Image
Name:
ArenasEnciso_CartaAutorizacionpdfa.pdf
Size:
142.76 KB
Format:
Adobe Portable Document Format
Description:
Carta Autorización

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.28 KB
Format:
Item-specific license agreed upon to submission
Description:
logo

El usuario tiene la obligación de utilizar los servicios y contenidos proporcionados por la Universidad, en particular, los impresos y recursos electrónicos, de conformidad con la legislación vigente y los principios de buena fe y en general usos aceptados, sin contravenir con su realización el orden público, especialmente, en el caso en que, para el adecuado desempeño de su actividad, necesita reproducir, distribuir, comunicar y/o poner a disposición, fragmentos de obras impresas o susceptibles de estar en formato analógico o digital, ya sea en soporte papel o electrónico. Ley 23/2006, de 7 de julio, por la que se modifica el texto revisado de la Ley de Propiedad Intelectual, aprobado

DSpace software copyright © 2002-2026

Licencia