Ciencias Exactas y Ciencias de la Salud

Permanent URI for this collectionhttps://hdl.handle.net/11285/551039

Pertenecen a esta colección Tesis y Trabajos de grado de las Maestrías correspondientes a las Escuelas de Ingeniería y Ciencias así como a Medicina y Ciencias de la Salud.

Browse

Search Results

Now showing 1 - 9 of 9

Exome variant analysis in 40 mexican pulmonary arterial hypertension patients
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2025-12) Sánchez Pichardo, Brenda Eloisa; Treviño Alvarado, Víctor Manuel; emimmayorquin, emipsanchez; Tamez Peña, José Gerardo; Martínez Ledesma, Juan Emmanuel; Sánchez Díaz, Carlos Jerjes; Balderas Martínez, Yalbi Itzel; García Rivas, Gerardo de Jesús; School of Engineering and Sciences; Campus Estado de México
Pulmonary arterial hypertension (PAH) is a rare and detrimental disease with a strong genetic component, yet most studies have focused on European or Asian populations. Consequently, little is known about the genetic landscape of PAH in Mexico or whether certain variants have been underrepresented due to ancestry bias in other datasets. This work integrates a set of bioinformatic tools to identify and interpret genetic variants from Whole Exome Sequencing (WES) data of 40 Mexican patients diagnosed with the disease. All patients were recruited by Dr. Carlos Jerjes Díaz Sánchez, the primary clinical col- laborator and data provider for this study. The workflow covered all major steps of data processing, including quality control, read mapping, variant calling, and annotation. These procedures were automated through a custom pipeline implemented in Nextflow, ensuring reproducibility. Subsequently, the analytical phase integrated domain-specific knowledge to interpret variant relevance. First, we systematically examined variants present in 21 PAH-related genes. Second, we explored additional variants based three computational methods: ClinVar annotations, Gene Ontology (GO) terms, and computational predictions. This approach enabled a comprehensive assessment of potential pathogenic variants. Among the 21 PAH-related genes, BMPR2 showed the strongest evidence of pathogenicity, with two variants classified as pathogenic and one of uncertain significance, represent- ing 8% of unrelated individuals. Variants of uncertain significance were also found in eight other PAH-related genes (NOTCH3, EDN1, KCNA5, NOS2, SMAD9, TBX4, and TOPBP1), distributed across 10 of the 39 patients. Additional variants with strong but partially conflicting evidence were identified in HPGDS, TLR4, HSPB9, and other genes. These findings reinforce the central role of BMPR2 in PAH while highlighting po- tential modulatory roles of additional genes involved in inflammation and stress response pathways. Notably, not a single variant was assigned to more than four patients, suggesting that most variants were recently acquired in the family or that those individuals are the first in their families.
Development of polygenic scores for the Mexican population for obesity, diabetes, and dyslipidemias
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-10) Torres Treviño, David; Treviño Alvarado, Víctor Manuel; emipsanchez; Garza Hernández, Debora; Martínez Ledesma, Juan Emmanuel; School of Engineering and Sciences; Campus Monterrey
Genetic prediction estimates a risk for a diseases using genetic data, aiding with earlier diagnoses, prevention, and targeted treatments. Polygenic scores (PGS) estimate risk by multiple single nucleotide polymorphisms (SNPs), the most common form of genetic variation. Though each SNP has a small effect, their joint effect provides key insights into the risk for common diseases influenced by multiple genetic factors. A key limitation of PGS is that the majority have been trained on European populations, leading to a significant drop in predictive accuracy when applied to non-European groups. This study aimed to address this issue by improving the accuracy of PGS for type 2 diabetes (T2D), BMI, Triglycerides, Total Cholesterol, HDL, and LDL levels in the Mexican population through a series of strategies. We implemented various established methods for constructing PGS, including techniques that have shown success in non-European populations and ensemble models combining ancestrybased PGS scores to optimize accuracy across diverse populations. Our key innovation lies in applying shrinkage to the ancestry-based PGS according to each individual’s ancestry proportions, prioritizing ancestry-based scores that are genetically closer to the individual and enhancing the relevance of matched ancestry data. Our results showed no improvement, and in some cases, a decrease in accuracy when using multi-ethnic or Mexican training data, likely due to the underrepresentation of non-European individuals and the small sample size of the Mexican GWAS. However, notable exceptions included LDL and Triglycerides predictions, where the Mexican GWAS outperformed the European GWAS. This outcome may be attributed to genetic loci associated with lipid levels unique to Mexicans, some linked to Amerindian ancestry which explain a greater variance than the loci captured in the European GWAS. Moreover, ensembles incorporating both ancestry adjustment and the Mexican-based PGS underperformed compared to the European baseline model, whereas those excluding the Mexican-based PGS exceeded the European baseline’s performance. Ensembles constructed with Lassosum and LDpred2 fell short of the PRScsx ensemble’s results, suggesting an advantage to jointly modeling multiple populations rather than treating them separately. Introducing ancestry adjustment in PRScsx (RAW4) maintained accuracy and, in some traits, even improved it for subgroups with predominantly African ancestry, showing promise in the proposed ancestry-based shrinkage approach. However, despite these improvements, disparities in accuracy persisted across population subgroups, especially for individuals with a high proportion of African ancestry. These results highlight the current challenge of generalizability gaps in PGS models, even for methods designed for diverse populations like PRScsx. Future studies could focused on developing a sophisticated Bayesian framework for ancestry adjustment, refining ancestry estimation methods, or incorporating a Native American component to better capture the genetic diversity in the Mexican population.
Computational identification of genetic polymorphisms influencing human gene expression in obesity gene
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-05) Jácome Velasco, Farid; Treviño Alvarado, Víctor Manuel; emimmayorquin; Campus Estado de México
The heritability of obesity has been estimated to be between 40% and 70%. 60 GWAS and more than 1,100 loci were reported. Most of these loci are in non-coding regions, making it more difficult to understand the role of these variants in the disease. One of the methods to understand these non-coding variants is to estimate their effects on gene expression levels of the neighbouring gene (cis-eQTL) or far away genes (trans-eQTL). This is achieved by a regression model explaining the gene expression level by the genetic variant and other covariates. The GTEx project characterized genetic effects on transcriptome across different tissues with eQTLs but did not report any eQTL on the principally expressed tissue of genes involved in obesity. Our project employed a rigorous eQTL mapping approach, utilizing gene expression and whole genome data from the reputable GTEx database. The genotype data, obtained from the GTEx consortium, was meticulously divided by each of the 22 chromosomes. The expression data, downloaded from the GTEx portal, was carefully processed into an expression matrix. Covariates were included to adjust for principal components, sex, PEER factors and protocol. The MatrixeQTL model, a well-established method, was used for the eQTL mapping of 21 genes related to the leptin-melanocortin pathway in tissues where these genes are highly expressed (pituitary gland, hypothalamus, and adipose visceral tissue). Our thorough approach led to the identification of 8221 eQTLs, with the gene POMC having the most eQTLs. This project generated a set of cis—and trans-eQTLs. These eQTLs may explain the variability of gene expression in genes related to obesity. They can be used for follow-up analyses, including colocalization or Mendelian randomization, to highlight the effect of these variants directly on the obesity phenotype.
Estimation of ancestry in the mexican population using informative genetic markers
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024) Valdez Alvarez, Héctor; Treviño Alvarado, Víctor Manuel; emipsanchez; Orozco Orozco, Lorena Sofía; García Ortiz, Humberto; Martínez Ledesma, Juan Emmanuel; Escuela de Ingeniería y Ciencias; Campus Monterrey; Garza Hernández, Debora
The study of genetic ancestry has become an essential component of modern genetics, offering insights into the origins and migrations of human populations. This thesis presents the development of a genetic ancestry panel specifically tailored for the Mexican population, a group characterized by its high genetic diversity and complex admixture. The primary objective of this research is to accurately estimate the proportions of ancestry in Mexicans using informative genetic markers, thereby addressing the underrepresentation of this population in Genome-Wide Association Studies (GWAS). In the initial phase, various genetic databases were considered, and three were selected for the development of the ancestry panel: the 1000 Genomes Project (1000G), the Human Genome Diversity Project (HGDP), and the Metabolic Analysis in an Indigenous Sample (MAIS). The integration of these datasets provided a comprehensive view of genetic diversity crucial for the panel's accuracy. Principal Component Analysis (PCA) was employed to visualize the genetic structure and verify the separation of ancestral groups. The results confirmed the integrity of the selected datasets. Three methods for selecting Ancestry Informative Markers (AIMs)—Top K, Balanced K, and SumInfo K—were developed and evaluated. Although Balanced K and SumInfo K showed better performance than Top K, integrating Mexican data (MAIS) posed significant challenges, particularly due to the influence of East Asian populations. To address these issues, a revised strategy was implemented, focusing on optimizing AIM selection and improving the robustness of the panel. This involved a detailed workflow and validation process, ensuring the final panel's reliability. Despite the challenges, the new strategy demonstrated promising results, and the final panel is expected to be completed soon. The developed ancestry panel has significant implications for forensic science, personalized medicine, and anthropological research. By accurately estimating ancestry proportions in the Mexican population, this research contributes to a broader understanding of genetic diversity and supports more effective medical and forensic applications. Future work will focus on finalizing the panel and applying it to the oriGen project, which aims to analyze genetic data from a large cohort of Mexicans, further enhancing the understanding of this population's genetic landscape.
In silico identification of cis-regulatory elements in folate biosynthesis and 1C metabolism genes in plants
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2021-11-26) Salinas Espinosa, Jessica Pamela; TREVIÑO ALVARADO, VICTOR MANUEL; 205076; Treviño Alvarado, Víctor Manuel; puemcuervo; Cuevas Díaz Durán, Raquel; Rodríguez López, Carlos; Martínez Ledesma, Juan Emmanuel; School of Engineering and Sciences; Campus Monterrey; Díaz de la Garza, Rocío Isabel
Folates (vitamin B9) are enzyme cofactors required for all organisms for one-carbon (1C) transfer reactions. A deficiency of these nutrients can lead to several health problems. Since humans are not natural producers of folates, the intake of these nutrients from plants is vital for human nutrition. Several techniques that involve the genetic modification of organisms have proved to be effective for the fortify plants with essential macronutrients. However, to achieve this, it is necessary to elucidate the metabolic control in plant systems. Although the genes involved in folate biosynthesis and 1C metabolism in plants are known, the mechanisms of transcriptional regulation have not yet been explored. This project focuses on discovering cis-regulatory DNA elements (motifs) using computational data analysis to provide insights regarding the regulation of folate biosynthesis in plants. For this, we first collected a compendium of known genes related to folate biosynthesis. Then, a database comprising the DNA promoter regions of folate biosynthesis and 1C metabolism genes in 19 different plant species was built and analyzed using different motif discovery algorithms. Afterward, the discovered motifs were tested for statistical significance and further associated with their putative biological role using other bioinformatics tools. A total of 149 statistically significant motifs (p < .05) were discovered in 18 of 19 species using the GimmeMotifs ensemble algorithm. These motifs were represented in 104 different regulatory networks built automatically from co-expression clusters obtained from each plant species. The results from this work could provide an insight into the transcriptional regulation of the folate biosynthesis pathway in plants. Furthermore, the elements found could be used for research in gene editing techniques to produce biofortified crops.
Identificación de genes con alteraciones en número de copias asociados a sobrevida en diversos tipos de cáncer
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2021-07-26) Guardado Méndez, Alejandra; Treviño Alvarado, Víctor Manuel; emipsanchez; Martínez Ledesma, Juan Emmanuel; Sepúlveda Villegas, Maricruz; Escuela de Medicina y Ciencias de la Salud; Campus Monterrey
Una de las alteraciones más comunes en cáncer son las alteraciones en número de copias (CNAs) que contribuyen al inicio y progresión del cáncer. Algunos CNAs pueden ayudar a pronosticar el tiempo de vida de los pacientes y tomar mejores decisiones sobre tratamientos, por lo que su identificación es un tema central en investigación. La prueba estadística más comúnmente empleada para comparar la sobrevida entre grupos de pacientes es la prueba log-rank, la cual generalmente utiliza una aproximación asintótica (distribución X2) que requiere que las poblaciones a comparar sean de tamaño similar y el número de eventos sea grande (Vandin F., et al., 2015). En cáncer, muchas alteraciones a analizar se presentan en pocos pacientes, generando grupos de tamaño muy distinto. Por consiguiente, la prueba log-rank bajo la hipótesis nula no sigue una distribución normal o chi-cuadrada y esto puede conducir a inexactitudes en la estimación de los valores p y por tanto a descubrimientos falsos (Vandin F., et al., 2015; Wang R., 2010) Algunas alteraciones sin embargo pueden ser de utilidad clínica, aunque se presenten en una proporción baja a intermedia de tumores. Por ejemplo, duplicaciones en la región que incluye a ERBB2, un oncogén que codifica para el factor de crecimiento epidermal humano y cuyo efecto es su sobre-expresión, se presenta en 10-20% en tumores de cáncer de mama (Marotta, M., 2012), pero solo en el 2-14% en cáncer de pulmón no microcítico (NSCLC) (Siena, S., et al., 2018). En ambos tipos de cáncer, la duplicación de ERBB2 es predictor pronóstico de alto riesgo (Siena, S., et al., 2018). Para estos casos donde las aproximaciones pueden ser menos exactas, algunas implementaciones del log-rank test se han propuesto para realizar estimaciones más precisas de la distribución nula a fin de encontrar asociaciones realmente significativas (Treviño & Tamez-Pena, 2017; Vandin et al., 2015) Dado que no se han analizado antes CNAs con métodos más exactos, este proyecto de investigación tuvo como objetivo usar VALORATE para identificar genes con alteraciones en número de copia asociados a sobrevida en diversos tipos de cáncer. Asimismo, se creó un proceso de análisis general para identificar CNAs de alto y bajo riesgo, para validar la expresión de los genes involucrados, para realizar un análisis funcional básico de los genes y para encontrar evidencia clínica de respuesta a fármacos.
Predicting drug Responses in cancer cells using genomic features and machine learning
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2020-05) Evans Trejo, Cody Eduardo; Treviño Alvarado, Víctor Manuel; ilquio; Tamez Peña, José; Martínez Torteya, Antonio; Escuela de ingeniería y ciencias; Campus Estado de México; Martínez Ledesma, Juan Emmanuel
This document presents an analysis for the prediction drug responses in cancer cells using cancer genomic features and machine learning for the Master’s Degree in Computational Sciences at Instituto Tecnologico y de Estudios Superiores de Monterrey. Cancer is a genetic disease characterized by the progressive accumulation of mutations. There are several genomic features involved in oncogenesis such: gene mutation, copy number, expression, and epigenetic alterations. These features vary depending the person and type of cancer, making it difficult to determine whether a drug will response successfully for each specific case. Recently, two large-scale pharmacogenomic studies screened multiple anticancer drugs on over 1000 cell lines in an effort to elucidate the response mechanism of anticancer drugs. Based on this data, we proposed a drug-response prediction framework that uses gene expression, methylation, copy number, mutation, protein expression features and drug sensitivity data from the Cancer Cell Line Encyclopedia (CCLE) database. For this we compare the performance of several algorithms such as Random Forest, Support Vector Machine, Elastic-Net and Extreme Gradient Boosting Tree (XGBoost). Robustness of our model was validated by cross-validation. The dataset of RNAseq using XGBoost obtain the highest average accuracy for individual datasets. Our unified model achieved good cross validation performance for most drugs in the Cancer Cell Line Encyclopedia (≥85 % accuracy).These results suggest that drug response could be effectively predicted from genomic features using a battery of machine learning algorithm. Our model could be applied to predict drug response for certain drugs and potentially could play a complementary role in personalized medicine.
Identifying models of DNA polymorphisms associated with alzheimer’s disease using step-wise and genetic algorithms from GWAS data
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2019-05) Romero Rosales, Brissa Lizbeth; ROMERO ROSALES, BRISSA LIZBETH; 861461; TREVIÑO ALVARADO, VICTOR MANUEL; 205076; Treviño Alvarado, Víctor Manuel; Vallejo Clemente, Edgar Emmanuel; Moreno Treviño, María Guadalupe; Escuela de Ingeniería y Ciencias; Campus Monterrey
Alzheimer's disease is a neurodegenerative disorder that involves cognitive deterioration accompanied by memory loss and inability to reason, affecting the patient's ability to carry out daily activities. This disorder is caused by genetic, environmental and lifestyle factors. The determination of the genetic factors is very important because the disease can be prognosticated and therefore treated before it appears. However, despite research efforts and many putative detections using univariate analyses, only the APOE gene has been plentiful validated as a risk factor associated with late-onset Alzheimer's disease. Thus, the problem of missing heritability arises, implying that only one gene does not determine the heritability of a disorder, but the combined effect of genes could better explain it. Genome-Wide Association Studies (GWAS) traditionally use univariate techniques to determine the association between markers and diseases. This research proposes the use of machine learning techniques based on GWAS data to identify sets of polymorphisms that maximize discrimination between cases and controls. This document explains the traditional strategies and theoretical bases that support this research. It presents previous works that apply multivariate methods for the prediction of different diseases and treatments, and their most representative characteristics are considered the basis to inspire a new solution. The proposed methodology includes obtaining genetic data and a pre-processing stage. Afterward, the process involves several quality control procedures that filter samples and SNPs to reduce the number of false positives and false negatives. Next, a chi-squared association test with kinship correction is performed to pre-select markers. Predictive models are built using wrapper and embedded computational methods. The first wrapper method used is BSWiMS, which is based on statistics and procedures of forward and backward selection to generate a logistic model. Its best AUC was 0.689. The second wrapper method used is based on stochastic search and was an ensemble of Genetic Algorithms coupled to a Support Vector Machine classifier followed by a Forward Selection that achieved a maximum AUC of 0.716. The third algorithm used is LASSO, one of the most well-known embedded methods, which use L1-regularization and performs a feature selection process in the training stage of the model. This classifier achieved an AUC of 0.8005. This study incorporates the analysis of poorly classified samples in predictive models as a strategy to build higher predictive models. The best result obtained with the mixed model of the variants of previous models outperformed the others with an AUC of 0.842. This result is promising since the model generated with LASSO showed the highest discrimination between classes, based solely on genetic data. The biological relevance of the markers of the models is presented through their association with their respective gene. The models replicated variants previously associated with Alzheimer's disease, especially on chromosome 19 close to the APOE gene.
Detección de metabolismos diferenciales en plasma de pacientes con mastectomía reciente por cáncer de mama
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2012-12-01) Yañez Garza, Irma Luz; Yañez Garza, Irma Luz; 386051; Treviño Alvarado, Víctor Manuel; Díaz de la Garza, Rocío; Garza Rodríguez, Ma. de Lourdes; Villela Martínez, Luis M.; Programa de Graduados en Biotecnología; Campus Monterrey

Ciencias Exactas y Ciencias de la Salud

Browse

Filters

Settings

Sort By

Results per page

Search Results