Genetics

Permanent URI for this collectionhttps://digital.lib.washington.edu/handle/1773/4923

Browse

Recent Submissions

Now showing 1 - 20 of 165
  • Item type: Item ,
    Innovation in Duplication: Structural Diversity and Regulatory Control of Human Genes: TBC1D3
    (2026-04-20) Guitart, Xavi; Eichler, Evan E
    Segmental duplications (SDs) are a major source of genomic innovation, responsible for the genomic instability that accelerates novel gene function but also causes disease. Despite their importance, SDs have historically remained difficult to study due to their high sequence identity and copy number polymorphism, making them all but impossible to resolve with short-read sequencing and assembly. Recent advances in long-read sequencing and de novo genome assembly have made it possible to resolve these regions at haplotype resolution, enabling systematic investigation of complex duplicated gene families. This thesis leverages these technological advances to study the evolution, structural diversity, and regulation of TBC1D3, a primate-specific SD gene family implicated in neuronal progenitor proliferation, cortical expansion, and cancer.TBC1D3 is a young and highly duplicated gene family dispersed across chromosome 17, with the majority of paralogs embedded in two large SD clusters at 17q12. Prior functional studies demonstrated that TBC1D3 promotes cellular proliferation in both cancer and neurodevelopmental contexts, yet it remained unclear how a gene family with extreme copy number variation could contribute to tightly regulated developmental processes. In this work, I address this paradox by integrating long-read genome assemblies, comparative primate genomics, population-scale human variation, and paralog-resolved transcriptomics. In Chapter 2, I reconstruct the evolutionary history and human diversity of TBC1D3 using haplotype-resolved assemblies from 69 human haplotypes and 11 nonhuman primate species. I show that TBC1D3 independently expanded in at least five primate lineages and that humans experienced a recent expansion approximately 2–3 million years ago. Human haplotypes exhibit extraordinary structural diversity, differing by up to ~1 Mbp and more than 20 gene copies, making TBC1D3 one of the most structurally variable gene families in the human genome. Despite this variability, signatures of positive selection are detected along the African ape lineage, and I show that all human-expressed copies share a derived, human-specific modification of the protein C terminus, suggesting functional divergence during recent human evolution. Using a pangenomic and phylogenetic framework, I define distinct paralog groups and demonstrate that TBC1D3 expression is overwhelmingly restricted to a single paralog group located at the telomeric end of cluster 2. Chapter 3 investigates the regulatory basis of this striking paralog-specific expression. I demonstrate that TBC1D3 expression in human neural contexts is driven by a position-effect mechanism in which a fixed, copy number-constrained promoter derived from the neighboring gene NPEPPSP1 has been duplicated and fused upstream of a specific TBC1D3 paralog. Using comparative epigenomics, long-read transcriptomics, and neuronal differentiation models, I show that this NPEPPSP1–TBC1D3 fusion creates a dominant regulatory architecture that restricts transcription to a single copy despite extensive underlying copy number variation. This mechanism provides a parsimonious explanation for how TBC1D3 expression and function may remain stable while the surrounding gene family continues to diversify structurally. In Chapter 4, I describe experimental efforts to interrogate the functional consequences of human-specific modifications to TBC1D3, including the derived C-terminal extension. Although these experiments were not ultimately successful, their outcomes inform hypotheses regarding protein localization, posttranslational regulation, and context-dependent function that motivate future work. This thesis establishes a generalizable framework for studying complex SD gene families by integrating haplotype-resolved assemblies, evolutionary analysis, and paralog-aware regulatory interrogation. The findings reveal how SDs contribute to gene family expansion, regulatory innovation, and protein evolution, allowing rapid structural diversification. More broadly, this work demonstrates how long-read genomics enables direct investigation of genomic regions that have played a disproportionate role in human evolution and disease yet have remained largely inaccessible until now.
  • Item type: Item ,
    Chromatin Accessibility Beyond the Peaks
    (2026-04-20) Hamm, Morgan; Queitsch, Christine; Trapnell, Cole
    Chromatin state resides at the intersection of trans-acting factors that operate globally over the genome but respond to changing conditions, and DNA sequence which is invariant across conditions but varies locally around genes. Assays that measure features of chromatin state can help us understand this interplay between sequence and trans factors that ultimately governs transcriptional regulation. My dissertation work has been focused on the analysis of a long read chromatin accessibility assay called Fiber-seq. In this body of work I show that this single assay that, on its face, measures chromatin accessibility, is incredibly information rich and may help decode the regulatory logic governing gene expression. I present fiber-views, a software package for analysis of Fiber-seq data at aligned genomic positions. In the application of Fiber- seq to Zea mays I show Fiber-seq detects twice as many ACRs as ATAC-seq in paired samples. Fiber-seq is particularly good at identifying ACRs with short accessible elements and ACRs in repetitive regions, including transposable elements. Finally I present a novel analysis approach converting the single molecule data from Fiber-seq to a set of feature tracks. I show that these feature tracks are able to recapitulate chromatin states typically derived from multiple ChIP-seq assays. Fiber-seq derived features can predict gene expression, capturing nearly 60% of expression variation in maize. Patterns of Fiber-seq features can also be used to categorize ACRs reflecting their function and underlying sequence.
  • Item type: Item ,
    A characterization of HIV intra-host evolution spanning its foundational forces and their translational implications
    (2026-02-05) Romero, Elena; Feder, Alison F
    HIV's rapid evolutionary rate has proven a longstanding challenge to developing new treatment modalities and vaccinations. In this work, I first investigate how the rate of recombination, a primary driving force of evolution, can vary alongside intra-host viral demography. By developing a new method for estimating recombination rates, I show that HIV's recombination rate appears to be density dependent and is positively associated with viral load in intra-host viral populations. Then, I investigate how this rapid evolutionary rate allows HIV to escape from broadly neutralizing antibody (bNAb) treatments. I show that HIV escapes from two different bNAbs via distinct modes of evolution, based on how each bNAb targets the virus, and that in both cases many different viruses escape concurrently. Lastly, I find that the high number of distinct escaping viruses allows intra-host HIV populations to maintain their pre-treatment diversity and linkage levels even as escape variants sweep to high frequencies, upending traditional selective sweep signatures. Altogether, these discoveries further our understanding of the basic processes underpinning HIV evolution as well as their evolutionary and translational implications.
  • Item type: Item ,
    Brahma Associated Factor Complex in development and disease
    (2026-02-05) Danyko, Cassidy; Henikoff, Steve
    Chromatin accessibility, especially the process of nucleosome eviction is critical for geneactivation in eukaryotic cells. The Brahma-associated Factor (BAF) complex plays a key role in creating accessible chromatin at gene promoters enhancers in an ATP-dependent manner and by direct binding to DNA and nucleosomes In Chapter 2, I study the importance of BAF in a developmental context by testing the interdependence of ecdysone-receptor transcription factor binding with BAF ATPase nucleosome remodeling activity for gene activation during ecdysoneinduced metamorphosis in Drosophila melanogaster. To measure changes in the chromatin landscape, I use a chromatin profiling technology developed in our lab, Cleavage Under Targets & Tagmentation (CUT&Tag), which allows mapping of targets genome-wide. Furthermore I use a version for targeting Accessible Chromatin (CUTAC), which maps RNA Polymerase IIassociated DNA accessibility genome-wide In Chapter 3, I test the use of BAF-inhibitors in a triple-negative breast cancer cell-line in combination with the cyclin dependent kinase 12, CDK12, transcriptional kinase inhibitor. I measure the chromatin changes underlying observed synergy in cell cytotoxicity and cell-death in MDA-MB-231 cells treated with both FHT1015 BAF inhibitor and THZ531 CDK12 inhibition. The combined treatment leads to loss of chromatin and DNA integrity, supporting apoptosis. My work elucidates the importance of ATPdependent BAF activity with transcription factor binding in normal development and highlights the utility of BAF loss of function as a cancer-killing strategies
  • Item type: Item ,
    Novel single-cell genomic approaches for deciphering cellular heterogeneity
    (2026-02-05) Kim, Hyeon-Jin; Fowler, Doug; Trapnell, Cole
    Single-cell genomics has reshaped our understanding of developmental biology by uncovering intricate molecular states at unprecedented scale. However, continued development of experimental and computational approaches is needed to fully realize its potential. In this dissertation, I will introduce three distinct projects, each centered around adapting computational tools or developing novel experimental approaches to decipher cellular heterogeneity. In the first project, we adapted latent Dirichlet allocation to single-cell combinatorial indexed Hi-C (sci-Hi-C) intra-chromosomal contact maps to decompose the data into chromatin topics. Our approach enabled co-embedding and clustering of sci-Hi-C data derived from five different cell lines (GM12878, H1Esc, HFF, IMR90, and HAP1) and identification of cell type-specific topics of chromatin interactions. In the second project, we developed inexpensive spike-in controls for single-cell combinatorial indexed RNA-seq (sci-RNA-seq) experiments using a set of single-stranded hash oligonucleotides (“hash ladder”). To normalize for technical variation introduced within individual cells, we calculated a cell-specific size factor that is derived from the hash ladder. We applied the ladder to study the effects of various chemical perturbations, including RNA pol II elongation, histone deacetylation, and activation of the glucocorticoid receptor. In the third project, we vastly improved Visual Cell Sorting (VCS), which is an automated imaging workflow that enables binning and sorting of cells by visual phenotypes, by making it compatible with cell fixation and three-level sci-RNA-seq (sci-RNA-seq3). We applied VCS to sort over one million E15 F1 B6xCAST mouse embryo derived nuclei based on nucleolar and nuclear speckle size, and we profiled the sorted nuclei with sci-RNA-seq3. We revealed differences in these nuclear compartment sizes within and across cell types and identified both expected and unexpected correlations with proliferation and differentiation status. We identified 42 genes that positively correlated with relative nucleolar size, reflecting the activation of gene expression programs relating to ribosomal biogenesis and proteostasis stress response. Finally, we demonstrated that these genes can be used to quantify relative nucleolar size across mouse, human, and zebrafish developmental atlases.
  • Item type: Item ,
    Expanding the proteomics toolbox with intelligent data acquisition and genomic locus protein mapping
    (2025-10-02) McGann, Christopher; Schweppe, Devin K
    Mass spectrometry-based proteomics has emerged as a cornerstone technology for understanding biological systems, yet significant computational and methodological challenges still exist. This dissertation aims to improve three important areas of need in proteomics: intelligent data acquisition methodologies, peptide identification efficiency, and scalability of methods for characterizing DNA-protein interactions. Chapter 2 addresses the computational demands of modern proteomics by implementing fragment ion indexing in the widely-used Comet search algorithm. Chapter 3 introduces real-time spectral library search (RTLS), an intelligent data acquisition method that leverages whole-proteome spectral libraries to guide instrument decision-making during acquisition. Finally, Chapter 4 presents DNA O-MAP, a scalable method for characterizing locus-specific chromatin interactions that overcomes limitations of existing approaches.
  • Item type: Item ,
    Modeling temporal dynamics of early embryogenesis and aging
    (2025-08-01) Yang, Wei; Shendure, Jay A
    Throughout life, every individual undergoes a wide array of experiences, yet one constant remains: the passage of time across distinct life stages. In the prenatal phase, we develop from a single-celled zygote into a complex embryo through highly orchestrated cell fate decisions. After birth, our bodies continue to grow and learn, eventually entering a phase of gradual decline marked by aging. This aging process is characterized by increasingly disordered changes in cellular states, ultimately leading to dysfunction and cell death. While development and aging are often seen as opposite ends of the life spectrum, they share core biological principles: both involve dynamic gene expression programs, shifts in cellular identity, and remodeling of tissue architecture. A central question in both fields is: what genetic programs govern these transitions in cell states?Despite significant progress, a systematic, single-cell resolution understanding of these genetic programs remains lacking. Two major challenges hinder such efforts: first, it is difficult to obtain continuous human data over time; second, sample conditions are often limited, restricting our ability to assess contributing factors to variation. However, advances in single-cell RNA sequencing (scRNA-seq) now allow us to profile millions of cells across finely resolved time courses. I hypothesize that applying scRNA-seq to in vitro models or closely-related model organisms offers a powerful approach to uncovering the temporal progression of cellular states. In this thesis, I present two projects that leverage scRNA-seq and computational analysis to address this question. In the first project, we used scRNA-seq to study lineage specification in an in vitro model of early human embryogenesis known as gastruloids. We identified a retinoic acid signaling axis that is critical to early embryogenesis and improved the alignment between the in vitro model and the human embryo. Ultimately, we demonstrated that this new enhanced model can be used to study genetic variation during early embryogenesis through large-scale perturbation experiments. In the second project, we performed scRNA-seq on brain samples from rhesus macaques across the lifespan—from early infancy (5 months) to late adulthood (21 years). Through computational analysis, we constructed aging trajectories that capture changes in cell abundance and transcriptional profiles at single-cell resolution. These trajectories allowed us to identify cell subtypes vulnerable to aging and uncover gene regulatory networks driving their transitions. By aligning macaque aging trajectories to human neurodegenerative disease signatures, we observed strong convergence between the two processes. Together, these studies examine the temporal dynamics of gene regulation during both embryogenesis and aging. By establishing a new in vitro platform to model early human development and constructing aging trajectories in the non-human primate brain, we aim to uncover the genetic drivers of cell state transitions across the human lifespan. This work lays the groundwork for a deeper understanding of developmental and aging processes and enables comparative analyses between them.
  • Item type: Item ,
    From variants to cytokines: comprehensively characterizing how cells respond to perturbations
    (2025-05-12) Pendyala, Sriram; Fowler, Douglas
    Cells can be perturbed by a changing environment or a mutation in their genome. Therefore, the high-throughput single-cell dissection of both genotype-phenotype and signal-response relationships is pivotal to understanding cell function and disease. Here, we present two complementary platforms that leverage barcoding strategies to this end. The first platform, Variant In Situ Sequencing (VIS-seq), co-expresses protein-coding variants and linked circularized RNA barcodes that are easily sequenced in situ. This approach enables the simultaneous optical profiling of thousands of variants in a gene by mapping variant identity to cellular phenotypes. Applying VIS-seq to >3000 variants of LMNA and PTEN in >13 million sequenced cells from diverse cell types revealed detailed structure–function relationships, including variant-induced changes in protein localization, cell and nuclear morphology, and biochemical activity, illuminating mechanisms underlying laminopathies and PTEN-associated disorders. The second platform, CellCode, harnesses retroviral cell barcoding to uniquely tag individual clones, permitting pooled interrogation of cell-extrinsic perturbations. Integrated with single-cell transcriptomics and functional assays for cell growth and differentiation, CellCode generated an ex-vivo contextual atlas of the effects of 28 cytokines in mouse CD8+ T-cells. VIS-seq and CellCode are robust, highly scalable methodologies to interrogate the effect of genetic and environmental inputs on cellular biology.
  • Item type: Item ,
    Using genomic technology to transform how genetics is used to diagnose and treat disease
    (2025-05-12) Fayer, Shawn; Fowler, Doug; Starita, Lea
    Interpreting the clinical significance of rare genetic sequence variants is challenging due limited evidence, and as a result, most newly identified missense variants are interpreted as variants of uncertain significance (VUS). Multiplexed assays of variant effet (MAVEs), where hundreds to thousands of variant effects are measured in a single experiment have significantly accelerated the rate at which functional data are generated. Since functional data can be applied when interpreting variants, MAVEs have the potential to revolutionize clinical genetics by providing functional data at scale to resolve VUS. We systematically evaluated the clinical utility of MAVEs by integrating published MAVE data with clinical interpretations and resolved 49% of VUS for BRCA1, 69% for TP53, and 15% for PTEN. Although we demonstrated the potential for MAVEs to resolve uncertainty in genetic testing, MAVE technologies were limited to genes with phenotypes in utilitarian cancer derived cell lines. We addressed this limitation by developing iPSC-SGE, where variants are edited into iPSCs, enabling phenotyping in differentiated cells. We introduced 498 SNVs into POLG and 496 variants into MYBPC3. POLG variant effects were measured with a growth assay in iPSCs in the context of different background alleles and MYBPC3 variant effects were measured by variant abundance in cardiomyocytes. iPSC-SGE data was validated with known pathogenic and benign variants and is poised to generate functional data for genes previously inaccessible with MAVEs. Finally, we explored the use of variant effect predictors for variant interpretation, a major factor contributing to the VUS problem. We found that current calibration methods lead to inappropriate evidence for up to 75% of variants and offer a new solution for calibration via clustering VEP data for protein domains on similarity of score distributions. This method enables more accurate evidence strength thresholding while maintaining robust sets of calibration varants. Taken together, cell context specific functional data and variant specific VEP calibration will result in significant reduction to VUS while providing rich phenotypic insight for advancing precision medicine.
  • Item type: Item ,
    An investigation of proteoforms in health and disease using peptide-level readouts
    (2025-05-12) Moggridge, Sophie; Villén, Judit
    Proteins are the biomolecules that drive the functions of life. To fully appreciate and understand the role of protein diversity in health and disease, we must seek a deeper understanding of proteoforms - the molecular variants of canonical proteins. Although humans have around 20,000 protein-coding genes, millions of proteoforms arise through mutations, splicing, and post-translational modifications. My thesis work focuses on two proteoform types: missense mutations and phosphorylation. In Chapter 2, I demonstrate the utility of pooled mass spectrometry (MS)-based assays to measure solubility and thermal stability of missense mutations. Using ten disease-causing mutants of the human phosphoglucomutase 1 (PGM1) protein, we achieved improved resolution using our pooled MS assays compared to previously published studies that relied on individually purified mutants. Scaling of this approach to larger mutant libraries and diverse biochemical assays will significantly enhance our understanding of how missense mutations affect protein function and contribute to variant classification in disease contexts. Chapter 3 discusses proteoforms generated by phosphorylation, a post-translational modification that enables proteins to rapidly and reversibly alter their properties and functions. We investigate the phosphorylation signatures in response to osmotic, heat, and oxidative stresses. We identify shared and stress-type specific phosphorylation signatures that align with previously reported data. We also identify phosphorylation sites on proteins known to localize to stress granules, providing a candidate list of stress granule phosphorylation sites for mechanistic investigation. Together, these chapters highlight the power of MS-based methods for characterizing proteoforms and their roles in health and disease.
  • Item type: Item ,
    Genome-wide variation in human germline and postzygotic mutation rates
    (2025-01-23) Noyes, Michelle; Eichler, Evan E.
    De novo mutations (DNMs) are new variants that arise in the parental germline or early embryo. In this dissertation, I apply long-read sequencing technology to quads and a multi-generational pedigree to discover DNMs across the genome and quantify the de novo mutation rate. First, I demonstrate that long reads enable DNM discovery in previously inaccessible regions of the genome. These newly accessible regions, largely marked by repetitive sequence, have a significantly higher mutation rate than their unique counterparts, including an approximately 66% enrichment in segmental duplications. I was able to trace the origins of DNMs to either the parental germline or early rounds of embryogenesis, revealing that at least 15% of single nucleotide DNMs arise postzygotically, a 50% increase from earlier studies. Further, I found that 60% of postzygotic mutations are transmitted to the next generation, meaning that they contribute to segregating variation in the population. Finally, I estimate the de novo mutation rate to be approximately 1.2-1.3Ã 10-8 substitutions/base pair/generation for 30 year old parents, and the postzygotic mutation rate to be approximately 0.23Ã 10-8 substitutions/base pair/generation. My analyses reveal that repetitive regions are in fact hypermutable, and that more variation arises postzygotically than previously thought. This work also lays the foundation for the next frontier in DNM discovery: comparing assembled parent and child genomes to reveal variation in the most complex and mutable parts of the genome.
  • Item type: Item ,
    Algorithms for differential analysis of cellular composition in single-cell perturbation experiments
    (2025-01-23) Duran, Madeleine Marie; Trapnell, Cole
    Advancements in multiplexing techniques have enabled the application of single-cell genomic methods to comprehensively study the effects of high-throughput perturbation experiments at a whole-embryo scale. Such analyses aim to pinpoint key genes, cell types, and signaling pathways that control cell fate decisions during development. However, there is a lack of statistically principled tools for measuring how cell types shift after perturbations (genetic, chemical, or environmental) and identifying which genes regulate those transitions. In this thesis, I introduce two new software packages for studying single-cell perturbation experiments. Hooke is a new software package that uses Poisson-Lognormal models to perform differential analysis of cell abundances for perturbation experiments read out by single-cell RNA-seq. This versatile framework allows users to 1) perform multivariate statistical regression to describe how perturbations alter the relative abundances of each cell state and 2) describe how all pairs of states co-vary as a parsimonious network of partial correlations. To demonstrate Hooke’s utility, we analyzed a single-cell atlas of zebrafish organogenesis that includes wild-type and genetic perturbations at whole-embryo scale across multiple time points. This method identified novel genetic requirements for relatively rare cell types in the embryonic kidney. Platt is another new software package that uses Hooke's outputs to construct lineage graphs based on the covariation of cell type counts in time series and perturbation data. These graphs help identify candidate transcription factors important in lineage specification and organize differential abundance results into direct and indirect effects. With Platt, we study the impact of knocking out the lmx1b, a homeobox transcription factor with specific expression in multiple lineages. Both packages aim to fill a critical gap by allowing users to characterize how their experimental perturbations alter cells' proportions and molecular states in complex tissues or whole embryos.
  • Item type: Item ,
    Structural Variation and Expression of Segmentally Duplicated Human Genes
    (2025-01-23) Dishuck, Philip; Eichler, Evan
    Gene duplication is a major driver of evolution, and the African ape lineage, including humans, experienced a burst of segmental duplications (SDs). Recent gene duplications help explain the rapid phenotypic changes in humans despite a slowdown in point mutations in primates. However, these genes are particularly difficult to study due to limitations in sequencing and assembly—the first complete human assembly, including all segmentally duplicated genes, was not finished until 2022. Long-read DNA sequencing (PacBio HiFi [high-fidelity] and ONT [Oxford Nanopore Technologies]) now enables the routine assembly of highly contiguous human genomes, and long-read cDNA sequencing (Iso-Seq) allows paralog-specific assessment of gene models and identification of isoforms. In this thesis, I analyze the gene duplications of some of the first human HiFi and ONT assemblies and use Iso-Seq to functionally annotate recent duplications. I characterized 170 highly contiguous human haplotypes containing 47 Mbp of additional SD content absent from the first complete reference assembly. Using Iso-Seq, I annotated the segmentally duplicated genes in these assemblies, discovering 201 new genes in copy number polymorphic gene families. These include a coding gene fusion NSFP1-LRRC37A2 in an inverted form of the MAPT (tau) locus, and a KRAB-zinc finger gene present in 36% haplotypes that has only 69% amino acid identity to the best-matching annotated human gene. To validate long-read assemblies, I created a method, called GAVISUNK, that uses the distance between singly unique nucleotide k-mers (SUNKs) in ultra-long ONT reads to validate the structure of HiFi assemblies. This method identifies structural errors in assemblies and allows confident downstream of analysis of structural variation, unbiased by assembly artefacts. I performed a detailed analysis of a high copy number gene family, NPIP, which displays signatures of positive selection on the human and African ape lineage. Of 28 named human paralogs, I found that just three are fixed at a single copy (NPIPB2, B11, and B14). I found evidence of ongoing gene duplications, deletions, interlocus gene conversion, and large inversions mediated by NPIP duplication blocks. Two paralogs (B9 and B15) were within the most extreme percentile of tests for positive selection and selective sweeps. Full-length cDNA from 101 tissue/cell types revealed distinct gene models for subgroups of NPIPs, including a variable number tandem repeat (VNTR) that encodes a variably sized beta helix. Paralogs in that subgroup show enriched expression in brain tissue, while others retain the ancestral testis-enriched expression. These analyses reveal mechanisms for rapid evolution of duplicated genes and demonstrate their polymorphism among humans.
  • Item type: Item ,
    Advancing Software Tools for Designing Oligonucleotide FISH Probes: Enabling Visualization of Repetitive DNA in Varied Genomes
    (2025-01-23) Aguilar, Robin; Beliveau, Brian J; Noble, William S
    Genomes are organized in a specific and hierarchical manner that influences cell function and fate. The perturbation of genomic stability has been shown to mediate the rise of human diseases. While numerous tools have been developed to better understand the relationship of the non- repetitive genome toward preserving genomic stability, our understanding of the functional consequences of highly repetitive DNA is limited. During my thesis work, I have contributed to the development of software tools that may be used to support fluorescent in situ hybridization (FISH) assays to visualize highly repetitive DNA at the scale of diverse and fully assembled genome builds. The work I describe here includes the development of Tigerfish, a software tool to design oligonucleotides that target unique repetitive DNA intervals at the scale of genomes. Additionally, I also curated numerous scientific communications and advocacy resources to facilitate building more inclusive research spaces in genomics within and outside of the University of Washington. Through my work, I also created a curriculum that may be used to teach others in PhD programs about the importance of fostering supportive academic environments for those with diverse lived experiences beyond scientific learning spaces.
  • Item type: Item ,
    Application of quantitative cross-linking mass spectrometry methods to study interactome differences.
    (2024-09-09) Bakhtina, Anna; Bruce, James E
    Proteins carry out the vast majority of biological function inside living cells. They do so by changing their shapes, interacting with one another and other molecules such as DNA and RNA. Gaining an understanding of intra and inter molecular protein interactions on a systems-level would allow for much deeper understanding of how biological functions are performed and regulated. Quantitative comparisons of these interactions between different systems or upon perturbations can increase understanding of remodeling associated with the aging, disease, and treatments. Here, cross-linking mass spectrometry is demonstrated with quantitative comparisons of protein interactions within intact systems, such as cells and membrane bound organelles. This work shows that reproducible remodeling of interactomes associated with aging in skeletal muscle mitochondria can be detected with isobaric quantitative protein interaction reporter (iqPIR) technologies. Moreover, changes in the interactome showed correlation with age-associated mitochondrial functional decline. Interactome differences associated with distinct functional differences in different cell types were also observed. Quantitative comparison of interactomes of human cell lines HEK293, HeLa and MCF7 uncovered differences in chromatin remodeling, mitochondrial transport and others that are independent of protein abundance levels. Utilizing a novel genetic mouse model that allows isolation of mitochondria from tubule or podocyte kidney cells, quantitative cross-linking and mass spectrometry enabled identification of differentially regulated proteins and pathways in mitochondria within these cell types that would otherwise remain unknown.
  • Item type: Item ,
    Characterizing Alzheimer’s disease using quantitative proteomics
    (2024-09-09) Plubell, Deanna Lisa; MacCoss, Michael J
    Alzheimer’s disease is characterized by the accumulation of neuropathologic amyloid-β and tau peptides in the brain. Bottom-up mass spectrometry proteomics methods were used to understand the protein landscape in the brains with different causes of Alzheimer’s. Correlating peptide abundances with amyloid-β tryptic peptides reveals additional subgroups of disease in sporadic Alzheimer’s cases, with differences across the four brain regions sampled. A cerebrospinal fluid targeted mass spectrometry assay for Alzheimer’s disease related proteins was developed as a proof-of-concept of an updated assay development workflow. This work demonstrates the feasibility of using peptide performance on high-resolution instruments to inform assay targets on a unit-resolution instrument. Both projects with Alzheimer’s disease demonstrate the importance of proteoforms in human disease, a fact that we argue should be considered more carefully when interpreting or developing bottom-up proteomics experiments.
  • Item type: Item ,
    Mislocalization of diverse RNA species to synapses in Alzheimer’s disease and aging
    (2024-09-09) Smukowski, Samuel Nathan; Valdmanis, Paul
    The distribution of RNAs in neurons is non-random, and there are complex cellular mechanisms regulating the transport and localization of RNAs along neural projections to synaptic locations. This enables synapses to respond rapidly to synaptic signaling by synthesizing proteins from a pool of sequestered dormant mRNAs to alter synaptic strength, morphology, and connections. This phenomenon is called “localized translation” and is an essential mechanism facilitating plasticity in the brain and cognition. Alzheimer’s disease (AD) is a neurodegenerative disease characterized by buildup of extracellular amyloid plaques and intracellular hyperphosphorylated tau neurofibrillary tangles in the brain followed by synapse and neuron loss and subsequent cognitive impairment. Previous studies of AD have revealed disruptions among cytoskeletal trafficking and RNA binding proteins which are essential for proper synaptic localization of RNAs. Differences in expression of non-coding RNAs including microRNAs and circRNAs, which serve regulatory functions in the expression of mRNAs, have also been observed. Therefore, it is likely mislocalization of RNAs to synapses is taking place in AD which would negatively impact the mechanism of localized translation and contribute to synaptic dysfunction in disease. Given how dynamic and finely regulated RNA localization is for neuronal function, it is also possible changes in localization are taking place across lifespan. In this thesis, I investigate differences in RNA localization, including mRNAs, microRNAs, and circRNAs, in AD by RNA sequencing synaptosomes from patient and control tissue. Synaptosomes are fractionated particles encapsulating synaptic material. We find substantial differences in mRNA localization, and we also find differences in circRNA isoforms present at synapses. There are also correlations between differential expression of microRNAs and target mRNAs mislocalized in AD. Finally, we discover changes in synaptic RNAs across lifespan in mouse models that suggest AD may represent an acceleration of age-related changes. This research adds a new dimension to our understanding of AD pathology and suggests new targets for therapeutic interventions.
  • Item type: Item ,
    Dissecting the clinical significance of evolving pathogen diversity
    (2024-04-26) Wagner, Cassia; Bedford, Trevor
    A diversity of pathogens threaten human health. These pathogens include eukaryotes, prokaryotes, and viruses, together encompassing significant variation in their biology. Some have newly emerged in humans, like the RNA virus SARS-CoV-2, while others, like the protozoan parasite Plasmodium falciparum, have circulated in humans for thousands of years. Despite their differences, a common theme among successful pathogens is an ability to evolve to evade our immune responses and control efforts. With the revolution of nucleic acid sequencing over the past 20 years, pathogen genomics can now track evolution in real-time. Genomic methods allow us to determine if pathogen genetic diversity is a benign product of mutation and population dynamics, or if it represents adaptation of the pathogen to better survive. We can further quantify the impact of genetic diversity on disease severity and immune escape by combining sequence data with clinical metadata. In this thesis, I first describe my research using viral genomes and clinical records in Washington State to identify increased SARS-CoV-2 viral loads with the spike D614G mutation, but no alteration in disease severity. 614G was the first amino acid mutation to occur in spike, the receptor-binding protein which mediates entry into cells and is the primary target of protective immunity. At that time in the SARS-CoV-2 pandemic, we did not know if SARS-CoV-2 would evolve to increase its transmissibility, and this work was an early contribution to our understanding of SARS-CoV-2 evolution. This thesis also describes later work using SARS-CoV-2 genomes from Washington State and millions from around the globe to identify positive selection for a different mutation: ORF8 knockout. Much of SARS-CoV-2 genomic surveillance is focused on single nucleotide substitutions in spike, but this work showed how mutations, including deletions, in other parts of the genome can alter pathogen fitness. I also identify decreased hospitalizations and deaths associated with this mutation, illustrating the diverse impact of pathogen evolution on disease severity. The final section of this thesis describes my work using sequence data to understand the impact of previously evolved genetic diversity in P. falciparum on malaria outcomes. Specifically, I aim to use sequencing to understand how the genetic breadth of P. falciparum antigens impacts the development of immunity to malaria using longitudinal samples from a birth cohort in Uganda. I find limited evidence of improved disease outcomes with increasing infection number or antigen-specific exposures in the cohort data. However, I use a longitudinal model of P. falciparum that I built to demonstrate that the lack of signal results from sample collection and study design and is not necessarily biologically meaningful. I further use the model to determine how parasite sequencing can be effectively applied to answer key questions in malaria immunity. This thesis, like the pathogens it describes, covers a diversity of topics; in so doing, it demonstrates the power of pathogen genomics across a wide range of settings to understand continual pathogen evolution and its consequences on human health.
  • Item type: Item ,
    Scalable methods for genomic analysis of in vitro models of mammalian embryogenesis
    (2024-04-26) Regalado, Samuel; Shendure, Jay A
    Mammalian development, from the single-celled zygote to a multicellular individual, is an incredible dynamic journey that is marked by many milestones measurable across many scales. In fact, by the end of the first two weeks of human embryogenesis, most precursors of major tissues and organs required for life are already present. This developmental milestone is known as gastrulation. Here the embryo or gastrula undergoes invagination, creating the blastopore and three major layers. For example, the outermost layer, known as ectoderm, gives rise to the nervous system and skin; the middle layer, known as the mesoderm, gives rise to the musculoskeletal system and the heart; the innermost layer, known as the endoderm, gives rise to internal organs such as the lungs and liver. Collectively, these developmental cell types constitute the three major germ layers. Thus it is at the stage of gastrulation that cells of the embryo are specified toward distinct fates, leaving behind their relatively indistinct transcriptional states as pluripotent precursors. The advent of large consortia efforts, like the Human Genome Project or ENCODE, has ushered in new sequencing technologies, e.g. single-cell molecular phenotyping modalities like scRNA-seq, that are capable of uncovering the individual components or features of the genome that support the blueprint for multicellularity. For example, we now know that the genome can be partitioned into two categories: the coding genome and the non-coding genome. The coding genome is largely made up of genes, including cell-type specifying transcription factors (TFs). While approximately ~22,000 protein-coding genes have been decoded and cataloged, of which ~1600 or so are thought to be TFs, the overall coding proportion only makes up 1-2% of the mammalian genome. The other 98% of the genome is defined by the non-coding genome, where ~1 million non-coding regulatory elements, namely enhancers, are thought to reside. Despite our ever-growing knowledge of the genome, we know very little about the transcription factors or enhancers that are required for the myriad of cell types required for mammalian development. How this remarkable process unfolds at the molecular level is a timely question that remains elusive. The focus of my PhD has been to elucidate how the process of early development works, particularly when cells undergo cell fate specification during gastrulation. More specifically, I have been intensely focused on understanding the dynamics of germ layer formation through 1) functional characterization of non-coding DNA elements or enhancers, 2) defining key developmental transcription factors, and 3) tracing histories of cell lineages as they are emerging within a multicellular system. To tackle these complex areas of investigation, I have developed scalable methods applied to multicellular in vitro embryoid model systems of early development. In the first chapter, I describe current strategies to understand early development and cell fate specification. In the second chapter, I describe efforts to perturb and record lineages using a novel platform for clonal organoid generation. In the third chapter, I describe a highly multiplexed method with single-cell resolution for measuring autonomous activity of non-coding regulatory DNA in a multicellular context. Finally, in the last chapter, I conclude with my thoughts on the future of in vitro models alongside multi-modal measurements.
  • Item type: Item ,
    Transcript cleavage and polyadenylation in plants
    (2024-02-12) Gorjifard, Sayeh; Queitsch, Christine
    Eukaryotic gene expression is finely regulated at the post-transcriptional level by the untranslated regions of mRNA. The coding sequence (CDS) of mRNA is flanked by 5’- and 3’-untranslated regions (UTRs). The end boundary of the 3’ UTR is defined by transcript cleavage and polyadenylation. The genic region that determine where the cleavage and polyadenylation complex (CPMC) binds and cleaves is called the terminator. Terminators overlap significantly with 3’UTRs but also include the sequences after the 3’ UTR boundary. Elements in the resulting 3’ UTR modulate stability, nuclear export, localization, and translation. In this body of work, I will provide an overview of the historical exploration of terminator cleavage and polyadenylation, emphasizing the biotechnology that aided these discoveries. I will focus on how advances in DNA sequencing technologies expanded our understanding of terminator genetics and functionality across eukaryotes, with a particular emphasis on plants. Apart from transcriptome wide maps of cleavage and polyadenylation signaling, sequencing empowered functional genomics by enabling massively parallel reporter assays (MPRAs). These tools, in conjunction with computational machine learning, will allow the engineering of specific terminators for diverse applications in plant synthetic biology. Due to the limitations of plant systems, however, little work has been done to characterize plant terminator sequences on a genome wide basis for their strength in directing cleavage and fine tuning expression. Following upon recent developments optimizing massively parallel reporter assays in transient tobacco leaves and maize protoplasts, I characterized nearly all Arabidopsis thaliana and maize terminator sequences for their strength in conferring expression and cleavage. The resulting data helped train a deep learning model to predict terminator strength, aiding in the in silico evolution of synthetic and species-specific terminators. In the final chapter, I will address existing limitations in the field and propose new experiments to fill in the gaps. Finally, I will turn to the elephant in the room. Do all these high throughput sequencing technologies and protocols help us get any closer to accurately predicting gene expression? Are we even capturing the data in a meaningful way if we lose higher order information among all the layers of gene regulation?