Developing a Massively Parallel Reporter Assay for Studying Gene Regulatory Elements
| dc.contributor.advisor | Seelig, Georg | |
| dc.contributor.author | Wang, Ban | |
| dc.date.accessioned | 2020-08-14T03:23:43Z | |
| dc.date.available | 2020-08-14T03:23:43Z | |
| dc.date.submitted | 2020 | |
| dc.description | Thesis (Ph.D.)--University of Washington, 2020 | |
| dc.description.abstract | Although it took roughly 13 years for the Human Genome Project to be finished and cost roughly $2.7 billion, it was a big step that helped us get the genetic information to understand how genomes have impact on individuals and populations. The Human Genome Project provided a view of human genome sequences that is not representative of any one individual. Finding genetic variants any individual may carry and then to associate these variants with phenotypes or even diseases, was unrealistic back in the beginning of the 21th century given the time and cost of genome sequencing. However, with the rapid development of DNA sequencing technology, the cost of sequencing one genome has dropped to around $1000 today. Now, the main challenge is not sequencing the genome but to link the genotypes and phenotypes. In particular, the regulatory rules that govern how non-coding regions such as untranslated regions (UTRs) and introns control gene expression are not fully understood and making it challenging to interpret genetic variants that occur in such regions. In this dissertation, we reported how data from a massively parallel reporter assay (MPRA) can be combined with deep learning to build a predictive model that can be used to score any variant in an important class of non-coding regions. The MPRA we presented in this dissertation was specifically used on the 5’ untranslated region (5’ UTR), which is the region on an mRNA that is directly upstream from the coding sequence. We were specifically interested in 5’ UTRs because of their significant role in translation regulation. Although this predictive model was trained on data from a fully synthetic reporter library containing random sequence, it performed well on the task of predicting the impact of variants in the human genome. We also showed that the model could be used to design new sequences with targeted level of protein production, which provides a valuable tool for applications in mRNA therapeutics. This assay could be applied to any regions of interest with the idea of building predictive models through machine learning on big datasets collected from synthetic random library, showing the power of how machine learning coupled with synthetic biology in helping us understand the fundamentals of nature and life. | |
| dc.embargo.terms | Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Wang_washington_0250E_21229.pdf | |
| dc.identifier.uri | http://hdl.handle.net/1773/45795 | |
| dc.language.iso | en_US | |
| dc.rights | none | |
| dc.subject | 5' UTR | |
| dc.subject | Genetic Variants | |
| dc.subject | Machine Learning | |
| dc.subject | Massively Parallel Reporter Assay | |
| dc.subject | mRNA Translation | |
| dc.subject | Polysome Profiling | |
| dc.subject | Systematic biology | |
| dc.subject | Molecular biology | |
| dc.subject | Engineering | |
| dc.subject.other | Electrical engineering | |
| dc.title | Developing a Massively Parallel Reporter Assay for Studying Gene Regulatory Elements | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Wang_washington_0250E_21229.pdf
- Size:
- 13.76 MB
- Format:
- Adobe Portable Document Format
