High throughput determination of sequence-function relationships in protein and RNA
MetadataShow full item record
As more individuals have their genomes sequenced, more genetic variation is discovered. The problem of interpretation of this variation has become intractable using established methods of linking phenotype to genotype, due to the low throughput of these methods and the exponential increase in newly discovered genetic variants. My graduate studies have revolved around the development and application of high throughput methods for functionally characterizing genetic variation within synthesized libraries of mutants. I have applied these methods in both yeast and mammalian cells to study the functions of a diverse set of proteins and an RNA: PAB1, a poly(A) binding protein necessary for translation of mRNA into protein; BRCA1, a DNA repair protein associated with familial breast cancer; SUP4oc, a yeast tyrosyl tRNA that has been altered to suppress the ochre stop codon; and MOR1, the human Mu Opioid receptor, a G-protein Coupled Receptor important for pain relief. Each assay yielded a genotype-phenotype map that showed the functional effect of all mutations present in the library. Through analysis of these maps, we found that as many as 20-30% of single mutations were tolerated without loss of function and that peripheral, core, and RNA-binding positions could be distinguished from each other by the pattern of fitness effects across amino acid substitutions. In addition, we found that evolutionary conservation will often fail to predict whether a position will tolerate mutation, demonstrating the importance of measuring function directly rather than relying on conservation alone. Further demonstrating this point, we found that models using functional data from these assays outperform current computational methods for predicting the pathogenicity of mutations in BRCA1. However, we also showed that the combination of evolutionary data and high throughput mutational data can be useful for identifying sites where a protein interacts with other molecules, since almost all deleterious substitutions that are nevertheless present in one of a protein’s homologs are likely involved in intermolecular interactions. Finally, by analyzing variants containing two or more mutations, we found a significant and varied role for epistasis in gene function, with intramolecular interactions being much more prevalent and detrimental in tRNA than in PAB1. A closer look at individual instances of epistasis in PAB1 and SUP4oc showed that certain mutations are epistasis hot-spots, rescuing or pulling down the fitness of many other mutations, and that positive epistasis can occur via changes in conformation that accommodate otherwise detrimental changes. High throughput mutagenesis screens are potentially useful for both basic and clinical research, and will likely be an integral part of deciphering the ever-growing collection of genetic data.
- Genetics