Ciencias Exactas y Ciencias de la Salud

Now showing 1 - 2 of 2

Improving deep neural networks to identify depression using neural architecture search
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2025-06) Hernández Silva, Erick; Trejo Rodríguez, Luis Ángel; emipsanchez; Cantoral Ceballos, José Antonio; González Mendoza, Miguel; School of Engineering and Sciences; Campus Estado de México; Sosa Hernández, Víctor Adrián
A Neural Architecture Search (NAS) framework utilizing Evolutionary Algorithms (EAs) and a regressor model is proposed to improve the classification performance of Deep Neural Net- works (DNNs) for the early detection of Major Depressive Disorder (MDD) from speech data represented by Mel-Spectrograms. The framework automates the design of neural network architectures by systematically exploring a well-defined search space that integrates convo- lutional layers, batch normalization, dropout, max pooling, and self-attention mechanisms, aiming to capture both spatial and temporal features inherent in vocal signals. By optimiz- ing for the F1-score, the framework addresses challenges related to data imbalance, ensuring robust generalization across both depressed and non-depressed samples. The proposed approach employs an integer-based encoding scheme to represent candi- date architectures, coupled with repair and validation processes that ensure all architectures meet specific design constraints. A self-adaptive mechanism dynamically adjusts the muta- tion factor based on evolutionary feedback, improving the balance between exploration and exploitation during the search process. Furthermore, a surrogate model, built using Princi- pal Component Analysis (PCA) and XGBoost regressor, predicts architecture performance, significantly reducing computational costs by avoiding full model training for all candidates. Experimental validation, conducted on publicly available speech datasets, demonstrates that NAS-generated architectures may outperform manually designed state-of-the-art models in terms of F1-score, accuracy, precision, recall, and specificity. The results confirm the effec- tiveness of integrating self-attention mechanisms with convolutional operations for extracting relevant depression-related vocal biomarkers. This research underlines the potential of NAS in advancing non-invasive, scalable, and interpretable AI-driven tools for mental health as- sessment, contributing to early intervention strategies and improving clinical outcomes in depression diagnosis.
D3TEC Dataset: a data collection for deep learning research in depression classification featuring voice recordings of Spanish speakers using professional and cellphone microphones
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-05) Brenes García, Luis Felipe; Trejo Rodríguez, Luis Ángel; emimmayorquin; Villaseñor Pineda, Luis; Sosa Hernández, Víctor Adrián; School of Engineering and Sciences; Campus Monterrey; Cantoral Ceballos, José Antonio
Depression is a mental health condition that affects millions of people worldwide. Although common, it remains difficult to diagnose due to its heterogeneous symptomatology. Mental health questionnaires are currently the most used assessment method to screen depression; these, however, have a subjective nature due to their dependence on patients' self-assessments. Researchers have been interested in finding an accurate way of identifying depression through an objective biomarker. Recent developments in neural networks and deep learning have enabled the possibility of classifying depression through the computational analysis of voice recordings. However, this approach is heavily dependent on the availability of datasets to train and test deep learning models, and these are scarce. There are also very few languages available. This study proposes a protocol for the collection of a new dataset for deep learning research on voice depression classification, featuring Spanish speakers, professional and smartphone microphones, and a high-quality recording standard. This work aims at creating a high-quality voice depression dataset by recording Spanish speakers with a professional microphone and strict audio quality standards. The data is captured by a smartphone microphone as well for further research in the use of smartphone applications for depression identification. Our methodology involves the strategic collection of depressed and non-depressed voice recordings. Three types of data are collected: voice recordings, depression labels (using the PHQ-9 questionnaire), and additional data that could potentially influence speech. Recordings are captured with professional-grade and smartphone microphones simultaneously to ensure versatility and practical applicability. Several considerations and guidelines are described to ensure high audio quality and avoid potential bias in deep learning research. This data collection effort immediately enables new research topics on depression classification. Some potential uses include deep learning research on Spanish speakers, an evaluation of the impact of audio quality on developing audio classification models, and an evaluation of the applicability of voice depression classification technology on smartphone applications. A preliminary experimentation section is included to showcase the potential research areas that the creation of this dataset enables. This research marks a significant step towards the objective and automated classification of depression in voice recordings. By focusing on the underrepresented demographic of Spanish speakers, the inclusion of smartphone recordings, and addressing the current data limitations in audio quality, this study lays the groundwork for future advancements in deep learning-driven mental health diagnosis.

Ciencias Exactas y Ciencias de la Salud

Browse

Filters

Settings

Sort By

Results per page

Search Results