Modeling the Feasibility of Nanopore-Based Protein Identification in the Human Proteome

dc.contributor.advisorNivala, Jeffrey
dc.contributor.authorQueen, Melissa Sofia
dc.date.accessioned2025-10-02T16:07:31Z
dc.date.available2025-10-02T16:07:31Z
dc.date.issued2025-10-02
dc.date.submitted2025
dc.descriptionThesis (Ph.D.)--University of Washington, 2025
dc.description.abstractThe gap between what can be read from the genome and what is functionally realized at the protein level remains a central challenge in molecular biology. Because proteins are shaped by splicing, translation, and post-translational modifications, and often exist as diverse full-length proteoforms, proteomics demands new approaches beyond conventional mass spectrometry. This thesis explores the feasibility of identifying gene-encoded human proteins directly from nanopore-generated electrical signals ("squiggles"). Using AminoScribe, a simulation framework grounded in empirical nanopore data generated via an unfoldase-based translocation approach, simulated squiggles were produced from full-length human protein sequences. In this method, proteins are pulled through the nanopore by a molecular motor, enabling continuous signal acquisition along the entire length of each molecule. Each simulated signal was aligned to a reference library of canonical squiggle patterns using dynamic time warping (DTW), allowing for comparisons invariant to temporal distortions. Classification systems were evaluated using fifty simulated versions per protein to assess proteome-wide coverage. Results indicate that human protein squiggles contain distinct, classifiable signal features. A protein was considered "covered" if correctly identified in at least one of ten trials, and "robustly covered" if correctly identified in all ten. Under ideal simulation conditions, a nearest-neighbor classifier achieved nearly 100% coverage and 92% robust coverage. In more realistic, high-noise conditions, overall coverage remained high (99%), but robust coverage dropped to 10%. A multilayer perceptron (MLP) classifier improved performance in these conditions, increasing robust coverage to 33% using a reduced feature set. These findings establish a baseline for protein identification from nanopore squiggles and outline both the potential and current limitations of nanopore-based proteomics under simulated noise models. Future work will explore the extension of this approach to detecting post-translational modifications and other forms of protein-level variation.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherQueen_washington_0250E_28895.pdf
dc.identifier.urihttps://hdl.handle.net/1773/53978
dc.language.isoen_US
dc.rightsCC BY
dc.subjectNanotechnology
dc.subjectComputer science
dc.subjectMolecular biology
dc.subject.otherComputer science and engineering
dc.titleModeling the Feasibility of Nanopore-Based Protein Identification in the Human Proteome
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Queen_washington_0250E_28895.pdf
Size:
3.72 MB
Format:
Adobe Portable Document Format