Ciencias Exactas y Ciencias de la Salud
Permanent URI for this collectionhttps://hdl.handle.net/11285/551014
Pertenecen a esta colección Tesis y Trabajos de grado de los Doctorados correspondientes a las Escuelas de Ingeniería y Ciencias así como a Medicina y Ciencias de la Salud.
Browse
Search Results
- Botnet detection on twitter: a novel similarity-based clustering mechanism(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-12) Samper Escalante, Luis Daniel; Monroy Borja, Raúl; emipsanchez; Castro Espinoza, Félix Agustín; González Mendoza, Miguel; School of Engineering and Sciences; Rectoría Tec de Monterrey; Loyola González, OctavioBotnet detection on Twitter represents a critical yet under-explored research problem,as botnets programmed with malicious intent threaten the platform’s security and credibility. Although Twitter has implemented mitigation strategies, such as imposing restrictions andbans, these measures remain insufficient due to botnets’ rapid creation and expansion. Existing solutions proposed by researchers for manual and automated botnet detection typically rely on individual metrics commonly used for detecting bots. However, these approaches lack the necessary group-oriented analysis and metrics critical for effectively identifying botnets of varying sizes and objectives. To address this issue, we have developed an innovative botnet detection mechanism based on similarity, which significantly enhances the detection rate of botnets on Twitter. Each bot, regardless of its complexity, leaves detectable traces of automation in its creation, behavior, or interactions with other accounts. By characterizing these traces, we can establish relationships between bots, enabling effective botnet detection. Our mechanism constructs a regression model to quantify the similarity between bots, leveraging features from user data, tweet patterns, and social interactions on the platform. Then, it uses this similarity measure to build a distance matrix, enabling the formation of groups with shared attributes, connections, and objectives through clustering methods. Our botnet detection mechanism achieved extraordinary success, evidenced by high scores on external Clustering Validation Indices (CVIs) and the Area under the ROC Curve (AUC) compared to existing solutions from the literature. Furthermore, the mechanism proved effective when confronted with unknown botnets with varied objectives. Our experimental findings suggest that this work is well-positioned to strengthen future botnet detection mechanisms, having shown the value of incorporating social interaction features. This integration offers a strategic advantage in the ongoing arms race against botmasters and their malicious objectives. Additionally, our mechanism consistently outperforms other approaches across various metrics, configurations, and algorithms, underscoring its effectiveness and adaptability in different detection scenarios.
- Automatic detection of mental health disorders in social media(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-06-12) Villa Pérez, Miryam Elizabeth; Trejo Rodríguez, Luis Ángel; emipsanchez; González Mendoza, Miguel; Brena Pinero, Ramón Felipe; Moctezuma Ochoa, Daniela; Villaseñor Pineda, Luis; School of Engineering and Sciences; Campus Estado de MéxicoWith the rise of social media, these platforms have emerged as a crucial source of information for studying people's thoughts and behaviors. By using natural language processing and machine learning techniques, prior studies have explored the language of users living with different mental health conditions. However, these efforts have focused on analyzing conditions in isolation, particularly depression, and have relied on English-language data. The goal of this study is to examine the communications of English- and Spanish-speaking Twitter users through traditional and deep learning algorithms to automatically recognize whether they live with one of nine mental health conditions. To achieve that, we created two datasets in English and Spanish. The “diagnosed” set comprises the timeline of 1,500 users who explicitly reported in one or more of their posts having been diagnosed with one of the following: ADHD, Anxiety, Autism, Bipolar, Depression, Eating disorders, OCD, PTSD, and Schizophrenia. The “control” set comprises the timeline of 1,700 randomly selected users who had not disclosed a diagnosis. We extracted a variety of text features from the collected data, such as n-grams, q-grams, Part-of-speech (POS) tags, topic modeling, Linguistic Inquiry and Word Count (LIWC), and word embeddings, and trained traditional machine learning and deep learning classifiers for two tasks: binary classification, to distinguish between diagnosed and non-diagnosed users, and multiclass classification, to identify the specific diagnosis. The performance of the models was analyzed using 5-fold cross-validation, four different classification metrics (AUC, F1-score, Precision, and Recall), and the Friedman non-parametric test with the Finner post-hoc procedure. Overall, XGBoost and CNN performed the best in the two classification tasks. Employing our collected datasets, in binary classification, we achieved an AUC of 0.835 on the Spanish Twitter dataset using n-grams of words from one to three (UBT) and 0.846 on the English Twitter dataset with a 5-gram characters (C5) model. In multiclass classification, we obtained an AUC of 0.747 and 0.697 in the Spanish and English Twitter datasets, respectively. In the second phase of our research, we introduced a model named BiLEMD for the multiclass classification of mental disorders. Our approach adopts a hierarchical detection strategy, where each base model within our framework leverages diverse textual features. We aim to emulate, to some extent, the step-by-step approach employed in human clinical diagnostics. In clinical practice, professionals first determine the presence or absence of a condition before proceeding to specify its type. Although BiLEMD achieved the highest ranking in both the Spanish and English Twitter datasets, statistical significance differences were not observed. Nevertheless, additional analysis revealed that ensembles, including BiLEMD and Stacking, reduce misclassification within the control class. Moreover, BiLEMD exhibits slightly superior performance in terms of AUC and Recall compared to other classifiers. The development of computer-based methods for recognizing and classifying social media user profiles related to different mental health conditions could enhance the performance of applications aimed at early diagnosis and timely treatment.
- Providing a robust sentiment analysis model evaluation based on extrinsic and intrinsic metrics(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2023-06-01) Leon Sandoval, Edgar; Zareei, Mahdi; puemcuervo; Ochoa Ruiz, Gilberto; Barbosa Santillán, Liliana Ibeth; Pareja Lora, Antonio; School of Engineering and Sciences; Campus Monterrey; Falcón Morales, Luis EduardoIn past years, the world has been facing the COVID-19 pandemic. The pandemic has repercussions on several fronts, including mortality rates and declining physical health, but also in social, financial, and practically every area of life. It has led to different countries taking different mitigation measures, including both clinical and non-clinical interventions, either of which presents significant challenges to mental, emotional, and physical health and their respective programs. It is difficult for governmental or public organizations to incorporate mental and emotional health feedback into their decision-making processes, for just taking measurements is complex. To this end, a collection of $760,064,879$ public domain tweets were analyzed using several open sentiment analysis tools to investigate the collective emotional state of the epidemic during its development, news cycles, and the impact of government statements and actions, and to offer extrinsic measurements of the success of these sentiment analysis techniques. This research aims to evaluate several language models robustly by utilizing both intrinsic and extrinsic evaluation metrics. The extrinsic evaluation to be performed is a large-scale sentiment analysis study of COVID-19-related tweets generated in Mexico during 2020, which was the first year of the pandemic. Time series analysis and other descriptive statistics are then utilized to understand the emotional response to the pandemic better, doing so with state-of-the-art language models and providing a performance comparison with each other.

