Automatic detection of mental health disorders in social media

dc.audience.educationlevelInvestigadores/Researchers
dc.audience.educationlevelEstudiantes/Students
dc.audience.educationlevelMaestros/Teachers
dc.contributor.advisorTrejo Rodríguez, Luis Ángel
dc.contributor.authorVilla Pérez, Miryam Elizabeth
dc.contributor.catalogeremipsanchez
dc.contributor.committeememberGonzález Mendoza, Miguel
dc.contributor.committeememberBrena Pinero, Ramón Felipe
dc.contributor.committeememberMoctezuma Ochoa, Daniela
dc.contributor.committeememberVillaseñor Pineda, Luis
dc.contributor.departmentSchool of Engineering and Scienceses_MX
dc.contributor.institutionCampus Estado de Méxicoes_MX
dc.date.accessioned2025-09-26T16:32:29Z
dc.date.issued2024-06-12
dc.description.abstractWith the rise of social media, these platforms have emerged as a crucial source of information for studying people's thoughts and behaviors. By using natural language processing and machine learning techniques, prior studies have explored the language of users living with different mental health conditions. However, these efforts have focused on analyzing conditions in isolation, particularly depression, and have relied on English-language data. The goal of this study is to examine the communications of English- and Spanish-speaking Twitter users through traditional and deep learning algorithms to automatically recognize whether they live with one of nine mental health conditions. To achieve that, we created two datasets in English and Spanish. The “diagnosed” set comprises the timeline of 1,500 users who explicitly reported in one or more of their posts having been diagnosed with one of the following: ADHD, Anxiety, Autism, Bipolar, Depression, Eating disorders, OCD, PTSD, and Schizophrenia. The “control” set comprises the timeline of 1,700 randomly selected users who had not disclosed a diagnosis. We extracted a variety of text features from the collected data, such as n-grams, q-grams, Part-of-speech (POS) tags, topic modeling, Linguistic Inquiry and Word Count (LIWC), and word embeddings, and trained traditional machine learning and deep learning classifiers for two tasks: binary classification, to distinguish between diagnosed and non-diagnosed users, and multiclass classification, to identify the specific diagnosis. The performance of the models was analyzed using 5-fold cross-validation, four different classification metrics (AUC, F1-score, Precision, and Recall), and the Friedman non-parametric test with the Finner post-hoc procedure. Overall, XGBoost and CNN performed the best in the two classification tasks. Employing our collected datasets, in binary classification, we achieved an AUC of 0.835 on the Spanish Twitter dataset using n-grams of words from one to three (UBT) and 0.846 on the English Twitter dataset with a 5-gram characters (C5) model. In multiclass classification, we obtained an AUC of 0.747 and 0.697 in the Spanish and English Twitter datasets, respectively. In the second phase of our research, we introduced a model named BiLEMD for the multiclass classification of mental disorders. Our approach adopts a hierarchical detection strategy, where each base model within our framework leverages diverse textual features. We aim to emulate, to some extent, the step-by-step approach employed in human clinical diagnostics. In clinical practice, professionals first determine the presence or absence of a condition before proceeding to specify its type. Although BiLEMD achieved the highest ranking in both the Spanish and English Twitter datasets, statistical significance differences were not observed. Nevertheless, additional analysis revealed that ensembles, including BiLEMD and Stacking, reduce misclassification within the control class. Moreover, BiLEMD exhibits slightly superior performance in terms of AUC and Recall compared to other classifiers. The development of computer-based methods for recognizing and classifying social media user profiles related to different mental health conditions could enhance the performance of applications aimed at early diagnosis and timely treatment.es_MX
dc.description.degreeDoctor of Philosophy in Computer Science
dc.format.mediumTexto
dc.identificator120304||320105
dc.identifier.citationVilla-Pérez, M. E. (2014) Automatic Detection of Mental Health Disorders in Social Media [Tesis de doctorado]. Tecnologico de Monterreyes_MX
dc.identifier.cvu637273es_MX
dc.identifier.orcidhttps://orcid.org/0000-0003-0236-4919
dc.identifier.scopusid57215653581es_MX
dc.identifier.urihttps://hdl.handle.net/11285/704176
dc.language.isoeng
dc.publisherInstituto Tecnológico y de Estudios Superiores de Monterreyes_MX
dc.relation.isFormatOfacceptedVersion
dc.rightsopenAccesses_MX
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0es_MX
dc.subject.classificationINGENIERÍA Y TECNOLOGÍA::CIENCIAS TECNOLÓGICAS::TECNOLOGÍA DE LOS ORDENADORES::INTELIGENCIA ARTIFICIAL
dc.subject.classificationMEDICINA Y CIENCIAS DE LA SALUD::CIENCIAS MÉDICAS::PSIQUIATRÍA::PSICOLOGÍA CLÍNICA
dc.subject.keywordMental health
dc.subject.keywordMulticlass classification
dc.subject.keywordSocial media
dc.subject.keywordTwitter
dc.subject.keywordMachine learning
dc.subject.lcshScience
dc.subject.lcshTechnology
dc.titleAutomatic detection of mental health disorders in social mediaes_MX
dc.typeTesis Doctorado / doctoral Thesises_MX

Files

Original bundle

Now showing 1 - 4 of 4
Loading...
Thumbnail Image
Name:
VillaPerez_TesisDoctorado.pdf
Size:
9.02 MB
Format:
Adobe Portable Document Format
Description:
Tesis Doctorado
Loading...
Thumbnail Image
Name:
VillaPerezMiryam Elizabeth_TesisOriginal_pdf
Size:
9.37 MB
Format:
Adobe Portable Document Format
Description:
Tesis Original
Loading...
Thumbnail Image
Name:
VillaPerez_ActaGradoDeclaracionAutoria.pdf
Size:
442.91 KB
Format:
Adobe Portable Document Format
Description:
Acta de Grado y Declaración de Autoría
Loading...
Thumbnail Image
Name:
VillaPerez_CartaAutorizacion_pdf
Size:
46.77 KB
Format:
Microsoft Word XML
Description:
Carta Autorización

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.3 KB
Format:
Item-specific license agreed upon to submission
Description:
logo

El usuario tiene la obligación de utilizar los servicios y contenidos proporcionados por la Universidad, en particular, los impresos y recursos electrónicos, de conformidad con la legislación vigente y los principios de buena fe y en general usos aceptados, sin contravenir con su realización el orden público, especialmente, en el caso en que, para el adecuado desempeño de su actividad, necesita reproducir, distribuir, comunicar y/o poner a disposición, fragmentos de obras impresas o susceptibles de estar en formato analógico o digital, ya sea en soporte papel o electrónico. Ley 23/2006, de 7 de julio, por la que se modifica el texto revisado de la Ley de Propiedad Intelectual, aprobado

DSpace software copyright © 2002-2026

Licencia