Toward Efficient Machine Learning Systems with Sampling and Compression.

dc.contributor.advisorCeze, Luis
dc.contributor.authorLin, Chien-Yu
dc.date.accessioned2025-08-01T22:19:27Z
dc.date.available2025-08-01T22:19:27Z
dc.date.issued2025-08-01
dc.date.submitted2025
dc.descriptionThesis (Ph.D.)--University of Washington, 2025
dc.description.abstractThe rapid growth of machine learning - in terms of model size, dataset volume, and task complexity - has created significant computational and memory efficiency challenges. This thesis addresses these challenges by developing sampling and compression techniques across a variety of machine learning applications. Specifically, we introduce sampling strategies for efficient graph neural networks (CacheSample), accelerated 3D image rendering (FastSR-NeRF), and optimized retrieval-augmented generation systems (TeleRAG). We further propose novel compression methods targeting convolutional neural networks (SPIN), large language models (Atom), and their key-value caches (Palu). Collectively, these techniques substantially reduce computational and memory requirements while preserving model accuracy, facilitating the scalability and accessibility of machine learning systems. Finally, I present my vision for future efficiency innovations to ensure continued scalability and robustness as machine learning models continue to grow in complexity.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherLin_washington_0250E_28591.pdf
dc.identifier.urihttps://hdl.handle.net/1773/53496
dc.language.isoen_US
dc.rightsnone
dc.subjectefficient machine learning
dc.subjectmachine learning system
dc.subjectComputer science
dc.subject.otherComputer science and engineering
dc.titleToward Efficient Machine Learning Systems with Sampling and Compression.
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Lin_washington_0250E_28591.pdf
Size:
14.72 MB
Format:
Adobe Portable Document Format