Developing a Massively Parallel Reporter Assay for Studying Gene Regulatory Elements

Wang, Ban

Developing a Massively Parallel Reporter Assay for Studying Gene Regulatory Elements

dc.contributor.advisor	Seelig, Georg
dc.contributor.author	Wang, Ban
dc.date.accessioned	2020-08-14T03:23:43Z
dc.date.available	2020-08-14T03:23:43Z
dc.date.submitted	2020
dc.description	Thesis (Ph.D.)--University of Washington, 2020
dc.description.abstract	Although it took roughly 13 years for the Human Genome Project to be finished and cost roughly $2.7 billion, it was a big step that helped us get the genetic information to understand how genomes have impact on individuals and populations. The Human Genome Project provided a view of human genome sequences that is not representative of any one individual. Finding genetic variants any individual may carry and then to associate these variants with phenotypes or even diseases, was unrealistic back in the beginning of the 21th century given the time and cost of genome sequencing. However, with the rapid development of DNA sequencing technology, the cost of sequencing one genome has dropped to around $1000 today. Now, the main challenge is not sequencing the genome but to link the genotypes and phenotypes. In particular, the regulatory rules that govern how non-coding regions such as untranslated regions (UTRs) and introns control gene expression are not fully understood and making it challenging to interpret genetic variants that occur in such regions. In this dissertation, we reported how data from a massively parallel reporter assay (MPRA) can be combined with deep learning to build a predictive model that can be used to score any variant in an important class of non-coding regions. The MPRA we presented in this dissertation was specifically used on the 5’ untranslated region (5’ UTR), which is the region on an mRNA that is directly upstream from the coding sequence. We were specifically interested in 5’ UTRs because of their significant role in translation regulation. Although this predictive model was trained on data from a fully synthetic reporter library containing random sequence, it performed well on the task of predicting the impact of variants in the human genome. We also showed that the model could be used to design new sequences with targeted level of protein production, which provides a valuable tool for applications in mRNA therapeutics. This assay could be applied to any regions of interest with the idea of building predictive models through machine learning on big datasets collected from synthetic random library, showing the power of how machine learning coupled with synthetic biology in helping us understand the fundamentals of nature and life.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Wang_washington_0250E_21229.pdf
dc.identifier.uri	http://hdl.handle.net/1773/45795
dc.language.iso	en_US
dc.rights	none
dc.subject	5' UTR
dc.subject	Genetic Variants
dc.subject	Machine Learning
dc.subject	Massively Parallel Reporter Assay
dc.subject	mRNA Translation
dc.subject	Polysome Profiling
dc.subject	Systematic biology
dc.subject	Molecular biology
dc.subject	Engineering
dc.subject.other	Electrical engineering
dc.title	Developing a Massively Parallel Reporter Assay for Studying Gene Regulatory Elements
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Wang_washington_0250E_21229.pdf
Size:: 13.76 MB
Format:: Adobe Portable Document Format

Download

Collections

Electrical engineering