Ciencias Exactas y Ciencias de la Salud

Technological advances applied to molecular biology, have led this discipline to perform several and more complex experiments, which outcomes have been summarized within massive databases, provoking the emergence of new disciplines as well as innovative approaches to analyze this bunch of data. One of these disciplines is Bioinformatics, where high-throughput data have been utilized to understand some diseases, such as cancer, which has been studied in order to provide a better classification, diagnosis, and provide new possible treatments to this condition. Available data go, from whole-genome sequencing to tissue images, proteomic, and metabolomic, etc. In the case of gene expression profiles, one of the most utilized study approaches is the performance of single-gene analysis, a test which consists in the measurement of the level of expression gene by gene, carrying out a comparison between the case and control samples by a statistical method (t-test, Wilcoxon-rank-sum), to assign a p-value related to every gene, then by a threshold filter process, we will be able to identify significant genes, and finally, proceed to give a biological interpretation from obtained results. However, this approach presents some lacks, within which, we can mention: Due to the adjustment process, (necessary for the number of tests performed) can lead to information loss, labeling wrongly as false-negative some relevant genes. The use of arbitrary threshold values, provokes discoveries to be falsely positive if the values for higher values or false negatives for lower values. Modifications in biological processes are related to groups of genes, thus, measuring the variation of the expression level of these groups of genes will let us to give a better biological interpretation. These groups of genes have been identified and nowadays we can find them within several public databases, these collections of gene sets are known as gene-set, and they could be used to provide better insight when analyzing expression data. Thus, the purpose of this thesis was to find, if the score-gotten through single-sample gene set enrichment analysis from the bibliography, Hallmark, Oncogenic, CMAP Up, CMAP Down collections is relevant to perform cancer subtype-classification by unsupervised learning techniques (Hierarchical clustering), identify involved pathways in the gene mutation presence or absence. Finally, re- late this score with the survival probability, we were able to determine the life expectancy of people and candidate treatment drugs, based on the level of expression from the determined gene set, related to a specific biological process, chemical alteration, or aberration.

Ciencias Exactas y Ciencias de la Salud

Browse

Filters

Settings

Sort By

Results per page

Search Results