Statistical Methods for Adaptive Immune Receptor Repertoire Analysis and Comparison

Olson, Branden

Statistical Methods for Adaptive Immune Receptor Repertoire Analysis and Comparison

dc.contributor.advisor	Matsen, Frederick
dc.contributor.author	Olson, Branden
dc.date.accessioned	2021-03-19T22:58:57Z
dc.date.available	2021-03-19T22:58:57Z
dc.date.issued	2021-03-19
dc.date.issued	2021-03-19
dc.date.issued	2021-03-19
dc.date.submitted	2020
dc.description	Thesis (Ph.D.)--University of Washington, 2020
dc.description.abstract	B and T cell receptors, also known as adaptive immune receptors, perform key roles in adaptive immunity. These proteins identify and deal with foreign invaders like viruses or bacteria, allowing for robust and long-lasting immunological protection. The DNA sequences coding for these receptors arise by a complex stochastic recombination process followed by a series of productivity-based filters, as well as affinity maturation for B cells, allowing for immense diversity in the circulating pool of these sequences. Thus, proper analysis of adaptive immune receptor repertoire sequence (AIRR-seq) datasets as well as the immune context surrounding them presents a formidable but necessary challenge to computational biologists. In this dissertation, I present three projects that contribute to AIRR-seq analysis with an emphasis on statistical methods for repertoire comparison. BCR sequences diversify through mutations introduced by purpose-built cellular machinery. A recent paper has concluded that templated mutagenesis, a hypothesized process in which mutations in the BCR locus are introduced by copying short segments from other BCR genes, is a major contributor to BCR diversification in mice and humans. If true, this would overturn decades of research and methodology involving B cell diversification. In joint work with Julia Fukuyama, I re-evaluate this hypothesis by directing the author's method at potential template donor genes not present in B cell genomes to obtain estimates of the methods's false positive rates. We find FPRs that are similar to or even higher than the original inferences, resulting in little to no evidence that templated mutagenesis occurs at a substantial rate. As AIRR-seq datasets are typically large and complex, it is non-trivial to characterize and compare them in precise yet interpretable ways. I introduce a comprehensive summary statistic framework that efficiently performs a wide variety of biologically-meaningful repertoire summaries and comparisons, and demonstrate how this framework can be used to perform general-purpose model validation. We find that summaries vary in their ability to differentiate between datasets, although many can distinguish between certain dataset covariates. Further, we show that recombination-based statistics tend to be more discriminative characterizations of a repertoire than those describing the amino acid composition of the CDR3 region. The framework also directly provides a convenient multidimensional scaling setup for visualizing dissimilarities between repertoires. Current methods of TCR repertoire comparison often incur a high loss of distributional information by considering overly simplistic sequence- or repertoire-level characteristics. Optimal transport methods can be used to compare distributions given some distance or metric between values in the sample space, with appealing theoretical and computational properties.I formulate a nonparametric approach to TCR repertoire comparison driven by contemporary optimal transport methods and a recently-created distance on the space of TCRs. I describe a clustering algorithm based on our methodology and show that it can extract biologically meaningful regions of a target repertoire with respect to a source repertoire using several case studies, thus competing with more complicated methods despite minimal modeling assumptions and a simpler pipeline. I also establish a randomization test to identify TCRs that are significantly enhanced between repertoires, and validate it using a proxy null distribution based on biological replicates.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Olson_washington_0250E_22327.pdf
dc.identifier.uri	http://hdl.handle.net/1773/46895
dc.language.iso	en_US
dc.rights	CC BY-NC
dc.subject	B cell receptor
dc.subject	Computational biology
dc.subject	Optimal transport
dc.subject	Statistical immunology
dc.subject	Statistical methods
dc.subject	T cell receptor
dc.subject	Statistics
dc.subject	Immunology
dc.subject	Bioinformatics
dc.subject.other	Statistics
dc.title	Statistical Methods for Adaptive Immune Receptor Repertoire Analysis and Comparison
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Olson_washington_0250E_22327.pdf
Size:: 2.37 MB
Format:: Adobe Portable Document Format

Download

Collections

Statistics