Statistical Methods for Adaptive Immune Receptor Repertoire Analysis and Comparison
| dc.contributor.advisor | Matsen, Frederick | |
| dc.contributor.author | Olson, Branden | |
| dc.date.accessioned | 2021-03-19T22:58:57Z | |
| dc.date.available | 2021-03-19T22:58:57Z | |
| dc.date.issued | 2021-03-19 | |
| dc.date.issued | 2021-03-19 | |
| dc.date.issued | 2021-03-19 | |
| dc.date.submitted | 2020 | |
| dc.description | Thesis (Ph.D.)--University of Washington, 2020 | |
| dc.description.abstract | B and T cell receptors, also known as adaptive immune receptors, perform key roles in adaptive immunity. These proteins identify and deal with foreign invaders like viruses or bacteria, allowing for robust and long-lasting immunological protection. The DNA sequences coding for these receptors arise by a complex stochastic recombination process followed by a series of productivity-based filters, as well as affinity maturation for B cells, allowing for immense diversity in the circulating pool of these sequences. Thus, proper analysis of adaptive immune receptor repertoire sequence (AIRR-seq) datasets as well as the immune context surrounding them presents a formidable but necessary challenge to computational biologists. In this dissertation, I present three projects that contribute to AIRR-seq analysis with an emphasis on statistical methods for repertoire comparison. BCR sequences diversify through mutations introduced by purpose-built cellular machinery. A recent paper has concluded that templated mutagenesis, a hypothesized process in which mutations in the BCR locus are introduced by copying short segments from other BCR genes, is a major contributor to BCR diversification in mice and humans. If true, this would overturn decades of research and methodology involving B cell diversification. In joint work with Julia Fukuyama, I re-evaluate this hypothesis by directing the author's method at potential template donor genes not present in B cell genomes to obtain estimates of the methods's false positive rates. We find FPRs that are similar to or even higher than the original inferences, resulting in little to no evidence that templated mutagenesis occurs at a substantial rate. As AIRR-seq datasets are typically large and complex, it is non-trivial to characterize and compare them in precise yet interpretable ways. I introduce a comprehensive summary statistic framework that efficiently performs a wide variety of biologically-meaningful repertoire summaries and comparisons, and demonstrate how this framework can be used to perform general-purpose model validation. We find that summaries vary in their ability to differentiate between datasets, although many can distinguish between certain dataset covariates. Further, we show that recombination-based statistics tend to be more discriminative characterizations of a repertoire than those describing the amino acid composition of the CDR3 region. The framework also directly provides a convenient multidimensional scaling setup for visualizing dissimilarities between repertoires. Current methods of TCR repertoire comparison often incur a high loss of distributional information by considering overly simplistic sequence- or repertoire-level characteristics. Optimal transport methods can be used to compare distributions given some distance or metric between values in the sample space, with appealing theoretical and computational properties.I formulate a nonparametric approach to TCR repertoire comparison driven by contemporary optimal transport methods and a recently-created distance on the space of TCRs. I describe a clustering algorithm based on our methodology and show that it can extract biologically meaningful regions of a target repertoire with respect to a source repertoire using several case studies, thus competing with more complicated methods despite minimal modeling assumptions and a simpler pipeline. I also establish a randomization test to identify TCRs that are significantly enhanced between repertoires, and validate it using a proxy null distribution based on biological replicates. | |
| dc.embargo.terms | Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Olson_washington_0250E_22327.pdf | |
| dc.identifier.uri | http://hdl.handle.net/1773/46895 | |
| dc.language.iso | en_US | |
| dc.rights | CC BY-NC | |
| dc.subject | B cell receptor | |
| dc.subject | Computational biology | |
| dc.subject | Optimal transport | |
| dc.subject | Statistical immunology | |
| dc.subject | Statistical methods | |
| dc.subject | T cell receptor | |
| dc.subject | Statistics | |
| dc.subject | Immunology | |
| dc.subject | Bioinformatics | |
| dc.subject.other | Statistics | |
| dc.title | Statistical Methods for Adaptive Immune Receptor Repertoire Analysis and Comparison | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Olson_washington_0250E_22327.pdf
- Size:
- 2.37 MB
- Format:
- Adobe Portable Document Format
