Identifying models of DNA polymorphisms associated with alzheimer’s disease using step-wise and genetic algorithms from GWAS data
dc.audience.educationlevel | Investigadores/Researchers | es_MX |
dc.contributor.advisor | Treviño Alvarado, Víctor Manuel | |
dc.contributor.author | Romero Rosales, Brissa Lizbeth | |
dc.contributor.committeemember | Vallejo Clemente, Edgar Emmanuel | |
dc.contributor.committeemember | Moreno Treviño, María Guadalupe | |
dc.contributor.institution | Escuela de Ingeniería y Ciencias | es_MX |
dc.contributor.institution | Campus Monterrey | es_MX |
dc.creator | ROMERO ROSALES, BRISSA LIZBETH; 861461 | es_MX |
dc.creator | TREVIÑO ALVARADO, VICTOR MANUEL; 205076 | es_MX |
dc.date.accessioned | 2019-08-30T14:49:38Z | |
dc.date.available | 2019-08-30T14:49:38Z | |
dc.date.created | 2019-05 | |
dc.date.issued | 2019-05 | |
dc.description.abstract | Alzheimer's disease is a neurodegenerative disorder that involves cognitive deterioration accompanied by memory loss and inability to reason, affecting the patient's ability to carry out daily activities. This disorder is caused by genetic, environmental and lifestyle factors. The determination of the genetic factors is very important because the disease can be prognosticated and therefore treated before it appears. However, despite research efforts and many putative detections using univariate analyses, only the APOE gene has been plentiful validated as a risk factor associated with late-onset Alzheimer's disease. Thus, the problem of missing heritability arises, implying that only one gene does not determine the heritability of a disorder, but the combined effect of genes could better explain it. Genome-Wide Association Studies (GWAS) traditionally use univariate techniques to determine the association between markers and diseases. This research proposes the use of machine learning techniques based on GWAS data to identify sets of polymorphisms that maximize discrimination between cases and controls. This document explains the traditional strategies and theoretical bases that support this research. It presents previous works that apply multivariate methods for the prediction of different diseases and treatments, and their most representative characteristics are considered the basis to inspire a new solution. The proposed methodology includes obtaining genetic data and a pre-processing stage. Afterward, the process involves several quality control procedures that filter samples and SNPs to reduce the number of false positives and false negatives. Next, a chi-squared association test with kinship correction is performed to pre-select markers. Predictive models are built using wrapper and embedded computational methods. The first wrapper method used is BSWiMS, which is based on statistics and procedures of forward and backward selection to generate a logistic model. Its best AUC was 0.689. The second wrapper method used is based on stochastic search and was an ensemble of Genetic Algorithms coupled to a Support Vector Machine classifier followed by a Forward Selection that achieved a maximum AUC of 0.716. The third algorithm used is LASSO, one of the most well-known embedded methods, which use L1-regularization and performs a feature selection process in the training stage of the model. This classifier achieved an AUC of 0.8005. This study incorporates the analysis of poorly classified samples in predictive models as a strategy to build higher predictive models. The best result obtained with the mixed model of the variants of previous models outperformed the others with an AUC of 0.842. This result is promising since the model generated with LASSO showed the highest discrimination between classes, based solely on genetic data. The biological relevance of the markers of the models is presented through their association with their respective gene. The models replicated variants previously associated with Alzheimer's disease, especially on chromosome 19 close to the APOE gene. | es_MX |
dc.description.degree | Master of Science in Computer Science | es_MX |
dc.format.medium | Texto | es_MX |
dc.identificator | 7||33||3304||120304 | |
dc.identifier.citation | Romero Rosales, B. L. (2019). Identifying models of DNA polymorphisms associated with Alzheimer’s Disease using Step-Wise and Genetic Algorithms from GWAS data. Instituto Tecnológico y de Estudios Superiores de Monterrey. Campus Monterrey. | es_MX |
dc.identifier.uri | http://hdl.handle.net/11285/633075 | |
dc.language.iso | spa | |
dc.publisher | Instituto Tecnológico y de Estudios Superiores de Monterrey | es_MX |
dc.relation.impreso | 2019-04-26 | |
dc.relation.isFormatOf | versión publicada | es_MX |
dc.relation.isreferencedby | REPOSITORIO NACIONAL CONACYT | |
dc.rights | openAccess | |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0 | * |
dc.subject.classification | INGENIERÍA Y TECNOLOGÍA::CIENCIAS TECNOLÓGICAS::TECNOLOGÍA DE LOS ORDENADORES::INTELIGENCIA ARTIFICIAL | es_MX |
dc.subject.keyword | gas | es_MX |
dc.subject.keyword | machine learning | es_MX |
dc.subject.keyword | step-wise methods | es_MX |
dc.subject.keyword | genetic algorithms | es_MX |
dc.subject.keyword | alzheimer's disease | es_MX |
dc.subject.lcsh | Ingeniería y Ciencias Aplicadas / Engineering & Applied Sciences | es_MX |
dc.title | Identifying models of DNA polymorphisms associated with alzheimer’s disease using step-wise and genetic algorithms from GWAS data | es_MX |
dc.type | Tesis de maestría |
Files
Original bundle
1 - 3 of 3
Loading...
- Name:
- Caratula firmas.pdf
- Size:
- 213.42 KB
- Format:
- Adobe Portable Document Format
- Description:
- Carátula de firmas
Loading...
- Name:
- Carta de Autorización2.pdf
- Size:
- 1.62 MB
- Format:
- Adobe Portable Document Format
- Description:
- Carta de Autorización
Loading...
- Name:
- Thesis_May19_BrissaLizbethRomeroRosales.pdf
- Size:
- 9.88 MB
- Format:
- Adobe Portable Document Format
- Description:
- Tesis
License bundle
1 - 1 of 1
Loading...

- Name:
- license.txt
- Size:
- 1.3 KB
- Format:
- Item-specific license agreed upon to submission
- Description: