Predicting drug Responses in cancer cells using genomic features and machine learning

Citation
Share
Abstract
This document presents an analysis for the prediction drug responses in cancer cells using cancer genomic features and machine learning for the Master’s Degree in Computational Sciences at Instituto Tecnologico y de Estudios Superiores de Monterrey. Cancer is a genetic disease characterized by the progressive accumulation of mutations. There are several genomic features involved in oncogenesis such: gene mutation, copy number, expression, and epigenetic alterations. These features vary depending the person and type of cancer, making it difficult to determine whether a drug will response successfully for each specific case. Recently, two large-scale pharmacogenomic studies screened multiple anticancer drugs on over 1000 cell lines in an effort to elucidate the response mechanism of anticancer drugs. Based on this data, we proposed a drug-response prediction framework that uses gene expression, methylation, copy number, mutation, protein expression features and drug sensitivity data from the Cancer Cell Line Encyclopedia (CCLE) database. For this we compare the performance of several algorithms such as Random Forest, Support Vector Machine, Elastic-Net and Extreme Gradient Boosting Tree (XGBoost). Robustness of our model was validated by cross-validation. The dataset of RNAseq using XGBoost obtain the highest average accuracy for individual datasets. Our unified model achieved good cross validation performance for most drugs in the Cancer Cell Line Encyclopedia (≥85 % accuracy).These results suggest that drug response could be effectively predicted from genomic features using a battery of machine learning algorithm. Our model could be applied to predict drug response for certain drugs and potentially could play a complementary role in personalized medicine.
Description
https://orcid.org/0000-0002-7472-9844