Robust unsupervised statistical learning for the identification and prediction of the risk profiles
Citation
Share
Abstract
The discovery of disease subtypes substantially impacts the selection of patient-specific treatment with implications for long-term survival and disease-related outcomes. Given the heterogeneity of disease phenotypes and the demand for a clear understanding of the features associated with the onset of the disease, this discovery of clinically relevant disease subtypes is not straightforward. Consequently, it is essential for clinical researchers that techniques of disease subtyping be robust and reproducible in clinical settings. This dissertation aims to provide a simple clinical tool that predicts the specific disease subtype of a patient. Therefore a robust unsupervised statistical learning method is presented, developed, and validated that analyzes multidimensional datasets and returns reproducible, robust unsupervised clustering Models of the identified patient subtypes. Unsupervised clustering techniques could realistically model disease heterogeneity. Each cluster represents a distinct homogenous disease subtype discovered through the analysis of the predicted Class-Co-Association Matrix (PCCAM) created by randomly resampling research data. Primarily, there is a PCCAM resulting from the test results of replicated random-crossvalidation of unsupervised clustering that depicts the joint probability of subjects-pairs belonging to the same cluster; thus, PCCAM can result in the discovery of all the reproducible clusters present in the studied data. We applied the proposed methodology to various diseases to discover subtypes such as Alzheimer's disease, Covid-19, and acute myeloid leukemia cancer with different data types. Our findings showed the proposed unsupervised approach could discover the subtypes of disease with statistical differences. Also, the characterization of discovered subgroups indicated other substantial differences in some features we considered studying amongst subgroups.
Description
https://orcid.org/0000-0003-1361-5162