Computational design and sensing algorithms for nanopore-based molecular tagging and peptide detection

Doroschak, Kathryn Jean

Computational design and sensing algorithms for nanopore-based molecular tagging and peptide detection

Files

Doroschak_washington_0250E_22536.pdf (6.04 MB)

Date

2021-07-07

relationships.isAuthorOf

Doroschak, Kathryn Jean

Abstract

Molecular sensing provides a window into the complex world of otherwise invisible molecules, allowing us to measure protein abundance or sequence DNA, for example. Commercially available nanopore arrays have already made DNA sequencing less expensive and more portable than existing platforms, and they have recently emerged as potential tools for general purpose molecular sensing. Nanopore arrays record a time series of ionic current observations and do not intrinsically detect any particular types of molecules; any molecule that can physically flow through the pore will partially block the ionic current flow in unique ways depending on its physical properties, producing a characteristic current trace. Since only DNA and RNA sequencing are officially supported, any applications beyond straightforward DNA sequencing require developing novel computational pipelines and algorithms to extract biologically relevant information. Here I present computational methods for three novel uses of commercial nanopore devices: (1) Porcupine, a molecular tagging system using custom designed nanopore-orthogonal DNA molecular bits (molbits); (2) Big Bits, a DNA data storage implementation using sequentially encoded molbits; and (3) Poretitioner, a pipeline for identifying NanoporeTERs (NTERs, Nanopore-addressable protein Tags Engineered as Reporters) and other engineered molecules. In each chapter, I present my contributions to novel computational analysis of nanopore data for these applications. Briefly, Porcupine labels physical objects using molecular tags. These tags encode digital information via the presence and absence of molbits, which I algorithmically designed to produce visually unique nanopore signals. The tags are later read back and decoded directly from the nanopore ionic current trace using a convolutional neural network (CNN). Big Bits extends upon this, using design principles from Porcupine to encode even more information for DNA data storage. Instead of using presence or absence to encode information, molbits in Big Bits are encoded sequentially. In the Poretitioner pipeline, I extract ionic current for captured peptides, then filter, classify, and quantify them using components built by both myself and others that can be tuned for various molecules.