Learning and inference with single cell data
Loading...
Date
Authors
Mukherjee, Sumit
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
A recent surge in development of high throughput single cell transcriptome sequencing methods has given rise to single cell data from a variety of cellular contexts. Single cell data can provide a wealth of statistical information about biological processes which are not available through bulk measurement methods. However, single cell data produces a new set of computational challenges because of the inherent noisiness from biological and technical sources. At present, much of the analysis performed on single cell datasets is done with common biological/general purpose tools which are not designed for data containing these noise sources. This makes these learning/inference algorithms highly unsuited for these datasets. Here, we have developed tools specifically designed to study single cell sequenced data and used them to study specic biological problems. Firstly, we have developed a pre-processing tool called UNCURL which uses a sampling distribution aware approach to estimate the true transcriptomic state of a cell from the heavily sampled observed (single cell RNA-Seq or scRNA-Seq) data. We demonstrate that using the estimated states, instead of observed data, leads to improvements in the performance of downstream algorithms for clustering and lineage inference. UNCURL also allows users to incorporate available qualitative prior information into the state estimation process, resulting in further enhancements in the performance of downstream algorithms. Next, we developed PIPER, a method that utilizes differential network analysis on scRNA-Seq data from biological progressions (such as differentiation) to identify the key regulator genes of these processes. PIPER uses a network inference algorithm that is specifically designed for highly sampled count valued data which outperforms commonly used methods. We show that PIPER correctly identifies known key regulators of several biological processes, including the temporal/pseudo-temporal order of their action. PIPER also makes several interesting predictions about genes which can provide starting points for future experimental studies. Finally, we demonstrate an application of single cell data sources in studying a particular gene network motif. Micro-RNA based incoherent feed forwards loops (IFFLs) have been demonstrated in the past to have biological noise reduction properties. In this work, we study the specic mechanism of their noise reduction property by analyzing their effect on specic components of noise: extrinsic and intrinsic noise. Our study demonstrates that IFFLs increase high frequency noise at the mRNA level, which in turn leads to lower overall noise at the protein level because of the time scale difference between transcription and translation.
Description
Thesis (Ph.D.)--University of Washington, 2018
