Ciencias Exactas y Ciencias de la Salud
Permanent URI for this collectionhttps://hdl.handle.net/11285/551039
Pertenecen a esta colección Tesis y Trabajos de grado de las Maestrías correspondientes a las Escuelas de Ingeniería y Ciencias así como a Medicina y Ciencias de la Salud.
Browse
Search Results
- Development of polygenic scores for the Mexican population for obesity, diabetes, and dyslipidemias(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-10) Torres Treviño, David; Treviño Alvarado, Víctor Manuel; emipsanchez; Garza Hernández, Debora; Martínez Ledesma, Juan Emmanuel; School of Engineering and Sciences; Campus MonterreyGenetic prediction estimates a risk for a diseases using genetic data, aiding with earlier diagnoses, prevention, and targeted treatments. Polygenic scores (PGS) estimate risk by multiple single nucleotide polymorphisms (SNPs), the most common form of genetic variation. Though each SNP has a small effect, their joint effect provides key insights into the risk for common diseases influenced by multiple genetic factors. A key limitation of PGS is that the majority have been trained on European populations, leading to a significant drop in predictive accuracy when applied to non-European groups. This study aimed to address this issue by improving the accuracy of PGS for type 2 diabetes (T2D), BMI, Triglycerides, Total Cholesterol, HDL, and LDL levels in the Mexican population through a series of strategies. We implemented various established methods for constructing PGS, including techniques that have shown success in non-European populations and ensemble models combining ancestrybased PGS scores to optimize accuracy across diverse populations. Our key innovation lies in applying shrinkage to the ancestry-based PGS according to each individual’s ancestry proportions, prioritizing ancestry-based scores that are genetically closer to the individual and enhancing the relevance of matched ancestry data. Our results showed no improvement, and in some cases, a decrease in accuracy when using multi-ethnic or Mexican training data, likely due to the underrepresentation of non-European individuals and the small sample size of the Mexican GWAS. However, notable exceptions included LDL and Triglycerides predictions, where the Mexican GWAS outperformed the European GWAS. This outcome may be attributed to genetic loci associated with lipid levels unique to Mexicans, some linked to Amerindian ancestry which explain a greater variance than the loci captured in the European GWAS. Moreover, ensembles incorporating both ancestry adjustment and the Mexican-based PGS underperformed compared to the European baseline model, whereas those excluding the Mexican-based PGS exceeded the European baseline’s performance. Ensembles constructed with Lassosum and LDpred2 fell short of the PRScsx ensemble’s results, suggesting an advantage to jointly modeling multiple populations rather than treating them separately. Introducing ancestry adjustment in PRScsx (RAW4) maintained accuracy and, in some traits, even improved it for subgroups with predominantly African ancestry, showing promise in the proposed ancestry-based shrinkage approach. However, despite these improvements, disparities in accuracy persisted across population subgroups, especially for individuals with a high proportion of African ancestry. These results highlight the current challenge of generalizability gaps in PGS models, even for methods designed for diverse populations like PRScsx. Future studies could focused on developing a sophisticated Bayesian framework for ancestry adjustment, refining ancestry estimation methods, or incorporating a Native American component to better capture the genetic diversity in the Mexican population.
- Estimation of ancestry in the mexican population using informative genetic markers(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024) Valdez Alvarez, Héctor; Treviño Alvarado, Víctor Manuel; emipsanchez; Orozco Orozco, Lorena Sofía; García Ortiz, Humberto; Martínez Ledesma, Juan Emmanuel; Escuela de Ingeniería y Ciencias; Campus Monterrey; Garza Hernández, DeboraThe study of genetic ancestry has become an essential component of modern genetics, offering insights into the origins and migrations of human populations. This thesis presents the development of a genetic ancestry panel specifically tailored for the Mexican population, a group characterized by its high genetic diversity and complex admixture. The primary objective of this research is to accurately estimate the proportions of ancestry in Mexicans using informative genetic markers, thereby addressing the underrepresentation of this population in Genome-Wide Association Studies (GWAS). In the initial phase, various genetic databases were considered, and three were selected for the development of the ancestry panel: the 1000 Genomes Project (1000G), the Human Genome Diversity Project (HGDP), and the Metabolic Analysis in an Indigenous Sample (MAIS). The integration of these datasets provided a comprehensive view of genetic diversity crucial for the panel's accuracy. Principal Component Analysis (PCA) was employed to visualize the genetic structure and verify the separation of ancestral groups. The results confirmed the integrity of the selected datasets. Three methods for selecting Ancestry Informative Markers (AIMs)—Top K, Balanced K, and SumInfo K—were developed and evaluated. Although Balanced K and SumInfo K showed better performance than Top K, integrating Mexican data (MAIS) posed significant challenges, particularly due to the influence of East Asian populations. To address these issues, a revised strategy was implemented, focusing on optimizing AIM selection and improving the robustness of the panel. This involved a detailed workflow and validation process, ensuring the final panel's reliability. Despite the challenges, the new strategy demonstrated promising results, and the final panel is expected to be completed soon. The developed ancestry panel has significant implications for forensic science, personalized medicine, and anthropological research. By accurately estimating ancestry proportions in the Mexican population, this research contributes to a broader understanding of genetic diversity and supports more effective medical and forensic applications. Future work will focus on finalizing the panel and applying it to the oriGen project, which aims to analyze genetic data from a large cohort of Mexicans, further enhancing the understanding of this population's genetic landscape.
- Predicting drug Responses in cancer cells using genomic features and machine learning(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2020-05) Evans Trejo, Cody Eduardo; Treviño Alvarado, Víctor Manuel; ilquio; Tamez Peña, José; Martínez Torteya, Antonio; Escuela de ingeniería y ciencias; Campus Estado de México; Martínez Ledesma, Juan EmmanuelThis document presents an analysis for the prediction drug responses in cancer cells using cancer genomic features and machine learning for the Master’s Degree in Computational Sciences at Instituto Tecnologico y de Estudios Superiores de Monterrey. Cancer is a genetic disease characterized by the progressive accumulation of mutations. There are several genomic features involved in oncogenesis such: gene mutation, copy number, expression, and epigenetic alterations. These features vary depending the person and type of cancer, making it difficult to determine whether a drug will response successfully for each specific case. Recently, two large-scale pharmacogenomic studies screened multiple anticancer drugs on over 1000 cell lines in an effort to elucidate the response mechanism of anticancer drugs. Based on this data, we proposed a drug-response prediction framework that uses gene expression, methylation, copy number, mutation, protein expression features and drug sensitivity data from the Cancer Cell Line Encyclopedia (CCLE) database. For this we compare the performance of several algorithms such as Random Forest, Support Vector Machine, Elastic-Net and Extreme Gradient Boosting Tree (XGBoost). Robustness of our model was validated by cross-validation. The dataset of RNAseq using XGBoost obtain the highest average accuracy for individual datasets. Our unified model achieved good cross validation performance for most drugs in the Cancer Cell Line Encyclopedia (≥85 % accuracy).These results suggest that drug response could be effectively predicted from genomic features using a battery of machine learning algorithm. Our model could be applied to predict drug response for certain drugs and potentially could play a complementary role in personalized medicine.
- Identifying models of DNA polymorphisms associated with alzheimer’s disease using step-wise and genetic algorithms from GWAS data(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2019-05) Romero Rosales, Brissa Lizbeth; ROMERO ROSALES, BRISSA LIZBETH; 861461; TREVIÑO ALVARADO, VICTOR MANUEL; 205076; Treviño Alvarado, Víctor Manuel; Vallejo Clemente, Edgar Emmanuel; Moreno Treviño, María Guadalupe; Escuela de Ingeniería y Ciencias; Campus MonterreyAlzheimer's disease is a neurodegenerative disorder that involves cognitive deterioration accompanied by memory loss and inability to reason, affecting the patient's ability to carry out daily activities. This disorder is caused by genetic, environmental and lifestyle factors. The determination of the genetic factors is very important because the disease can be prognosticated and therefore treated before it appears. However, despite research efforts and many putative detections using univariate analyses, only the APOE gene has been plentiful validated as a risk factor associated with late-onset Alzheimer's disease. Thus, the problem of missing heritability arises, implying that only one gene does not determine the heritability of a disorder, but the combined effect of genes could better explain it. Genome-Wide Association Studies (GWAS) traditionally use univariate techniques to determine the association between markers and diseases. This research proposes the use of machine learning techniques based on GWAS data to identify sets of polymorphisms that maximize discrimination between cases and controls. This document explains the traditional strategies and theoretical bases that support this research. It presents previous works that apply multivariate methods for the prediction of different diseases and treatments, and their most representative characteristics are considered the basis to inspire a new solution. The proposed methodology includes obtaining genetic data and a pre-processing stage. Afterward, the process involves several quality control procedures that filter samples and SNPs to reduce the number of false positives and false negatives. Next, a chi-squared association test with kinship correction is performed to pre-select markers. Predictive models are built using wrapper and embedded computational methods. The first wrapper method used is BSWiMS, which is based on statistics and procedures of forward and backward selection to generate a logistic model. Its best AUC was 0.689. The second wrapper method used is based on stochastic search and was an ensemble of Genetic Algorithms coupled to a Support Vector Machine classifier followed by a Forward Selection that achieved a maximum AUC of 0.716. The third algorithm used is LASSO, one of the most well-known embedded methods, which use L1-regularization and performs a feature selection process in the training stage of the model. This classifier achieved an AUC of 0.8005. This study incorporates the analysis of poorly classified samples in predictive models as a strategy to build higher predictive models. The best result obtained with the mixed model of the variants of previous models outperformed the others with an AUC of 0.842. This result is promising since the model generated with LASSO showed the highest discrimination between classes, based solely on genetic data. The biological relevance of the markers of the models is presented through their association with their respective gene. The models replicated variants previously associated with Alzheimer's disease, especially on chromosome 19 close to the APOE gene.