Modeling the Feasibility of Nanopore-Based Protein Identification in the Human Proteome
| dc.contributor.advisor | Nivala, Jeffrey | |
| dc.contributor.author | Queen, Melissa Sofia | |
| dc.date.accessioned | 2025-10-02T16:07:31Z | |
| dc.date.available | 2025-10-02T16:07:31Z | |
| dc.date.issued | 2025-10-02 | |
| dc.date.submitted | 2025 | |
| dc.description | Thesis (Ph.D.)--University of Washington, 2025 | |
| dc.description.abstract | The gap between what can be read from the genome and what is functionally realized at the protein level remains a central challenge in molecular biology. Because proteins are shaped by splicing, translation, and post-translational modifications, and often exist as diverse full-length proteoforms, proteomics demands new approaches beyond conventional mass spectrometry. This thesis explores the feasibility of identifying gene-encoded human proteins directly from nanopore-generated electrical signals ("squiggles"). Using AminoScribe, a simulation framework grounded in empirical nanopore data generated via an unfoldase-based translocation approach, simulated squiggles were produced from full-length human protein sequences. In this method, proteins are pulled through the nanopore by a molecular motor, enabling continuous signal acquisition along the entire length of each molecule. Each simulated signal was aligned to a reference library of canonical squiggle patterns using dynamic time warping (DTW), allowing for comparisons invariant to temporal distortions. Classification systems were evaluated using fifty simulated versions per protein to assess proteome-wide coverage. Results indicate that human protein squiggles contain distinct, classifiable signal features. A protein was considered "covered" if correctly identified in at least one of ten trials, and "robustly covered" if correctly identified in all ten. Under ideal simulation conditions, a nearest-neighbor classifier achieved nearly 100% coverage and 92% robust coverage. In more realistic, high-noise conditions, overall coverage remained high (99%), but robust coverage dropped to 10%. A multilayer perceptron (MLP) classifier improved performance in these conditions, increasing robust coverage to 33% using a reduced feature set. These findings establish a baseline for protein identification from nanopore squiggles and outline both the potential and current limitations of nanopore-based proteomics under simulated noise models. Future work will explore the extension of this approach to detecting post-translational modifications and other forms of protein-level variation. | |
| dc.embargo.terms | Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Queen_washington_0250E_28895.pdf | |
| dc.identifier.uri | https://hdl.handle.net/1773/53978 | |
| dc.language.iso | en_US | |
| dc.rights | CC BY | |
| dc.subject | Nanotechnology | |
| dc.subject | Computer science | |
| dc.subject | Molecular biology | |
| dc.subject.other | Computer science and engineering | |
| dc.title | Modeling the Feasibility of Nanopore-Based Protein Identification in the Human Proteome | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Queen_washington_0250E_28895.pdf
- Size:
- 3.72 MB
- Format:
- Adobe Portable Document Format
