Ciencias Exactas y Ciencias de la Salud
Permanent URI for this collectionhttps://hdl.handle.net/11285/551039
Pertenecen a esta colección Tesis y Trabajos de grado de las Maestrías correspondientes a las Escuelas de Ingeniería y Ciencias así como a Medicina y Ciencias de la Salud.
Browse
Search Results
- A robust and interpretable machine learning framework for vanadium oxide supercapacitors(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2025-06-13) Ortiz Aldana, Emmanuel Alexei; Kumar, Rudra; emimmayorquin; Mallar, Ray; Sánchez Ante, Gildardo; Kumar, Kishant; School of Engineering and Sciences; Campus Monterrey; Ebrahimibagha, DariushAs global energy demands intensify, the development of efficient, scalable and reliable energy storage systems becomes increasingly urgent. While lithium-ion batteries dominate the current market, their low power density makes them unsuitable for current fluctuations degrading their life expectancy. Supercapacitors (SCs) with pseudocapacitance materials such as vanadium oxide offer an attractive option, with high power density, long life cycle and fast charge-discharge rate. However, their low energy density remains a major bottleneck limiting broader adoption. Current supercapacitor research is focused on improving the specific capacitance and thus expanding their energy density, nevertheless this is mostly done on traditional trial and error experiments, making it time-consuming, slow and expensive. Materials Informatics offers a paradigm shift by implementing machine learning (ML) techniques to uncover patterns in existing data and accelerate the design of novel materials. Despite promising results, many current materials ML studies suffer from limitations such as small data range, improper data preprocessing, target leakage, and lack of reproducibility due to unshared code and datasets. In this work a robust machine learning framework was developed for vanadium oxide SCs, designed to extract interpretable insights from manually gathered literature data. A rigorous cross-validation (CV) pipeline was implemented to ensure reliable model evaluation, avoiding common pitfalls such as overfitting and data leakage. Among the evaluated models, a Voting Regressor combining Ridge Regression, Extreme Gradient Boosting (XGBoost), and Categorical Boosting (CatBoost) achieved the best performance with a mean absolute error (MAE), root mean squared error (RMSE), and 𝑅2 of 81 𝐹 𝑔 ⁄ , 104 𝐹 𝑔 ⁄ and 0.61, respectively. To extract insights from the models, interpretability algorithms, including permutation importance (PI) and SHapley Additive exPlanations (SHAP) values were employed. Binder-free electrodes, wider potential windows, and a low current density are consistently associated with higher specific capacitance predictions. These findings highlight the potential of interpretable methods to uncover the ML models behavior and lead guided design of SCs.
- Enhancing single-cell and spatial transcriptomics analysis: the role of imputation and feature selection(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2025-06) Chacón Ramírez Denisse; Rangel Escareño, Claudia; emipsanchez; Gómez Romero, Laura Lucila; Hernández Lemus, Enrique; Reséndis Antonio, Osbaldo; School of Engineering and Sciences; Campus MonterreySingle-cell RNA sequencing (scRNA-seq) and spatial transcriptomics have revolutionized our understanding of cellular heterogeneity and tissue organization. However, extracting biological insights from these technologies remains challenging due to high-dimensional, sparse, and noisy data. Two critical but understudied problems hinder robust analysis: (1) the impact of feature selection strategies on cell-type identification, and (2) the role of data imputation in integrating scRNA-seq with spatial transcriptomics. While clustering and integration methods are widely benchmarked, the influence of pre-processing decision, such as using biologically informed marker genes or imputing missing values, remains poorly understood. This thesis addresses these gaps through systematic evaluations. This thesis addresses these knowledge gaps through systematic evaluations across diverse datasets and algorithms. First, we assess how different imputation algorithms (MAGIC, DCA, scPHENIX) affect the integration of scRNA-seq with spatial transcriptomics in both ways, cell-type deconvolution and spatial transcript prediction. Using 13 paired datasets and 10 integration tools, we found that imputation’s benefits depend on the task and algorithm. The results reveal that imputation benefits are highly context-dependent rather than universally beneficial. SpaGE consistently outperformed other methods for transcript prediction regardless of imputation status, while RCTD demonstrated superior performance for cell deconvolution tasks. Notably, we observed that imputation primarily enhances magnitude estimation rather than improving spatial pattern preservation. Second, we evaluate whether marker gene-based feature selection improves scRNA-seq clustering accuracy compared to standard approaches. By benchmarking seven algorithms(Seurat, SC3, CIDR, etc.) across five pancreatic datasets, we demonstrate that performance gains are algorithm, and dataset-dependent. SC3 and TSCAN benefited from marker gene selection across multiple datasets, while SIMLR showed dramatic dataset-dependent responses,yielding superior ARI scores (greater than 0.7) in some contexts but diminished performance in others. The Segerstolpe dataset showed consistent improvements across most algorithms when using marker genes, suggesting dataset-specific characteristics strongly influence optimal feature selection strategies. Our analysis further revealed that algorithms often identify fewer clusters than reference annotations, indicating challenges in resolving fine-grained pancreatic cell type heterogeneity. The results of this thesis emphasize that pre-processing choices must align with both analytical goals and dataset characteristics to unlock the full potential of single-cell technologies. This work provides an evidence-based framework for optimizing spatial transcriptomics and scRNA-seq analysis workflows, with implications for understanding tissue architecture and cellular dynamics across diverse biological systems.

