Ciencias Exactas y Ciencias de la Salud

Permanent URI for this collectionhttps://hdl.handle.net/11285/551039

Pertenecen a esta colección Tesis y Trabajos de grado de las Maestrías correspondientes a las Escuelas de Ingeniería y Ciencias así como a Medicina y Ciencias de la Salud.

Browse

Search Results

Now showing 1 - 1 of 1
  • Tesis de maestría
    Pipeline evaluation of clustering algorithms aimed at clinical data
    (Instituto Tecnológico y de Estudios Superiores de Monterrey, 2018-05-22) Duarte Dyck, David Absalón; Temez Peña, José G.; Terashima Marín, Hugo; Treviño Alvarado, Víctor M.
    Disease understanding is key in designing effective treatments and diagnostic tools. A key aspect of this understanding is grouping the patients according to their phenotypes. Phenotypes are patterns in the characteristics of certain members of a population that are correlated with a particular illness. This grouping may be useful in revealing associations between disease risk, treatment responses, and other key clinical outcomes. Once these associations are found, it is easier to design tailored diagnosis tools and effective personalized treatments. To achieve this grouping goal, data is key, and recent advancements in digital technology have made possible to capture hundreds and thousands of clinical data that may be used to group patients into different disease phenotypes. To handle hundreds of patients, with hundreds of features, clinical researchers use clustering algorithms that automatically find hiding association between subjects. These algorithms are very useful once the researcher selects the correct clustering and configure it to the specific research task. Selecting the correct clustering algorithm is time-consuming, and setting up their parameters may take several trail and test sessions. On the other hand, computer scientists have developed several clustering metrics that can evaluate the fitness of the clustering algorithms to the data, and computer power has increased, allowing the automated testing and evaluation of the clustering algorithms in the specific data set. The objective of this proposal was the development of an automated computer pipeline that evaluates several clustering algorithms, providing metrics regarding important features such as clustering stability (Jaccard index) and clustering relevance (ANOVA test). Furthermore, the pipeline returns the number of natural clusters that may be useful for the given dataset (Dunn index). The designed pipeline was set up to evaluate the classical clustering algorithms of k-means, Fuzzy C-means, and Hierarchical clustering, but it can be used to test a user-provided clustering method. The evaluation consisted in bootstrapping the data and extracting the Dunn and Jaccard clustering indexes in a meaningful manner. Furthermore, the clinical relevance of the final clusters was evaluated using an ANOVA test, that provided indications of disease phenotypes. All the test results are plotted and the user can visually evaluate the performance of the different clustering methods in their data. The result of this development was deployed in R (github.com/majordave/clustest). The utility of the pipeline was tested on synthetic data sets and two radiomics datasets associated with the development of Osteoarthritis (OA) and the presence of breast cancer from mammograms. Furthermore, we contrasted the closeting approach to supervised learning of a large dataset of the association of nutrition with OA symptoms. Hence, the present work established that the automated robust evaluation of the utility of clustering algorithms in clinical data is feasible, and provided a publicly available software tool that can be used by any clinical researchers to select the best clustering algorithm for their data.
En caso de no especificar algo distinto, estos materiales son compartidos bajo los siguientes términos: Atribución-No comercial-No derivadas CC BY-NC-ND http://www.creativecommons.mx/#licencias
logo

El usuario tiene la obligación de utilizar los servicios y contenidos proporcionados por la Universidad, en particular, los impresos y recursos electrónicos, de conformidad con la legislación vigente y los principios de buena fe y en general usos aceptados, sin contravenir con su realización el orden público, especialmente, en el caso en que, para el adecuado desempeño de su actividad, necesita reproducir, distribuir, comunicar y/o poner a disposición, fragmentos de obras impresas o susceptibles de estar en formato analógico o digital, ya sea en soporte papel o electrónico. Ley 23/2006, de 7 de julio, por la que se modifica el texto revisado de la Ley de Propiedad Intelectual, aprobado

DSpace software copyright © 2002-2025

Licencia