High-dimensional machine learning for drug target discovery and precision medicine

relationships.isAuthorOf

Celik, Safiye

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Identifying mechanisms of complex diseases such as Alzheimer’s disease or cancer is essential to discover molecular targets for therapeutic intervention, ideally in a personalized manner. Gene expression measured in a tissue sample is an informative proxy to the protein levels in that tissue and known to drive disease outcomes. Machine learning offers promising computational and analytical solutions for extracting robust and relevant information from gene expression data. However, there are three main challenges with the application of a standard machine learning approach to a gene expression dataset. First, for most expression datasets, the number of variables is much greater than the number of samples, which makes models highly likely to overfit the training data (high-dimensionality). Second, there are technical and/or experimental confounders in any one study that make the features learned from an individual expression dataset not necessarily generalizable to other datasets (study-specific confounders). Finally, any complex disease mechanism originates from an interplay among different molecular elements and failing to incorporate prior information regarding a number of these elements is likely to result in an incomplete understanding of the underlying complex biology (different molecular elements). In this dissertation, we address these challenges by introducing four novel machine learning techniques, namely MGL, INSPIRE, MERGE, and EMBARKER. In fact, each of these techniques leverages the prior information readily available on different molecular elements to overcome the overfitting problem caused by the high-dimensionality and study-specific confounders. Our extensive statistical and biological evaluations demonstrate the superiority of each of the proposed methods over the alternative methods and reveal important disease mechanisms as well as potential therapeutic targets for disease intervention.

Description

Thesis (Ph.D.)--University of Washington, 2018

Citation

DOI