Developing a Massively Parallel Reporter Assay for Studying Gene Regulatory Elements

dc.contributor.advisorSeelig, Georg
dc.contributor.authorWang, Ban
dc.date.accessioned2020-08-14T03:23:43Z
dc.date.available2020-08-14T03:23:43Z
dc.date.submitted2020
dc.descriptionThesis (Ph.D.)--University of Washington, 2020
dc.description.abstractAlthough it took roughly 13 years for the Human Genome Project to be finished and cost roughly $2.7 billion, it was a big step that helped us get the genetic information to understand how genomes have impact on individuals and populations. The Human Genome Project provided a view of human genome sequences that is not representative of any one individual. Finding genetic variants any individual may carry and then to associate these variants with phenotypes or even diseases, was unrealistic back in the beginning of the 21th century given the time and cost of genome sequencing. However, with the rapid development of DNA sequencing technology, the cost of sequencing one genome has dropped to around $1000 today. Now, the main challenge is not sequencing the genome but to link the genotypes and phenotypes. In particular, the regulatory rules that govern how non-coding regions such as untranslated regions (UTRs) and introns control gene expression are not fully understood and making it challenging to interpret genetic variants that occur in such regions. In this dissertation, we reported how data from a massively parallel reporter assay (MPRA) can be combined with deep learning to build a predictive model that can be used to score any variant in an important class of non-coding regions. The MPRA we presented in this dissertation was specifically used on the 5’ untranslated region (5’ UTR), which is the region on an mRNA that is directly upstream from the coding sequence. We were specifically interested in 5’ UTRs because of their significant role in translation regulation. Although this predictive model was trained on data from a fully synthetic reporter library containing random sequence, it performed well on the task of predicting the impact of variants in the human genome. We also showed that the model could be used to design new sequences with targeted level of protein production, which provides a valuable tool for applications in mRNA therapeutics. This assay could be applied to any regions of interest with the idea of building predictive models through machine learning on big datasets collected from synthetic random library, showing the power of how machine learning coupled with synthetic biology in helping us understand the fundamentals of nature and life.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherWang_washington_0250E_21229.pdf
dc.identifier.urihttp://hdl.handle.net/1773/45795
dc.language.isoen_US
dc.rightsnone
dc.subject5' UTR
dc.subjectGenetic Variants
dc.subjectMachine Learning
dc.subjectMassively Parallel Reporter Assay
dc.subjectmRNA Translation
dc.subjectPolysome Profiling
dc.subjectSystematic biology
dc.subjectMolecular biology
dc.subjectEngineering
dc.subject.otherElectrical engineering
dc.titleDeveloping a Massively Parallel Reporter Assay for Studying Gene Regulatory Elements
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Wang_washington_0250E_21229.pdf
Size:
13.76 MB
Format:
Adobe Portable Document Format