Modeling the Feasibility of Nanopore-Based Protein Identification in the Human Proteome

Queen, Melissa Sofia

Modeling the Feasibility of Nanopore-Based Protein Identification in the Human Proteome

dc.contributor.advisor	Nivala, Jeffrey
dc.contributor.author	Queen, Melissa Sofia
dc.date.accessioned	2025-10-02T16:07:31Z
dc.date.available	2025-10-02T16:07:31Z
dc.date.issued	2025-10-02
dc.date.submitted	2025
dc.description	Thesis (Ph.D.)--University of Washington, 2025
dc.description.abstract	The gap between what can be read from the genome and what is functionally realized at the protein level remains a central challenge in molecular biology. Because proteins are shaped by splicing, translation, and post-translational modifications, and often exist as diverse full-length proteoforms, proteomics demands new approaches beyond conventional mass spectrometry. This thesis explores the feasibility of identifying gene-encoded human proteins directly from nanopore-generated electrical signals ("squiggles"). Using AminoScribe, a simulation framework grounded in empirical nanopore data generated via an unfoldase-based translocation approach, simulated squiggles were produced from full-length human protein sequences. In this method, proteins are pulled through the nanopore by a molecular motor, enabling continuous signal acquisition along the entire length of each molecule. Each simulated signal was aligned to a reference library of canonical squiggle patterns using dynamic time warping (DTW), allowing for comparisons invariant to temporal distortions. Classification systems were evaluated using fifty simulated versions per protein to assess proteome-wide coverage. Results indicate that human protein squiggles contain distinct, classifiable signal features. A protein was considered "covered" if correctly identified in at least one of ten trials, and "robustly covered" if correctly identified in all ten. Under ideal simulation conditions, a nearest-neighbor classifier achieved nearly 100% coverage and 92% robust coverage. In more realistic, high-noise conditions, overall coverage remained high (99%), but robust coverage dropped to 10%. A multilayer perceptron (MLP) classifier improved performance in these conditions, increasing robust coverage to 33% using a reduced feature set. These findings establish a baseline for protein identification from nanopore squiggles and outline both the potential and current limitations of nanopore-based proteomics under simulated noise models. Future work will explore the extension of this approach to detecting post-translational modifications and other forms of protein-level variation.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Queen_washington_0250E_28895.pdf
dc.identifier.uri	https://hdl.handle.net/1773/53978
dc.language.iso	en_US
dc.rights	CC BY
dc.subject	Nanotechnology
dc.subject	Computer science
dc.subject	Molecular biology
dc.subject.other	Computer science and engineering
dc.title	Modeling the Feasibility of Nanopore-Based Protein Identification in the Human Proteome
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Queen_washington_0250E_28895.pdf
Size:: 3.72 MB
Format:: Adobe Portable Document Format

Download

Collections

Computer science and engineering