A Unified Token and Parameter Compression Pipeline for High-Resolution Vision–Language Models

dc.contributor.advisorAli, Mohamed
dc.contributor.authorSultana, Tasnia
dc.date.accessioned2026-04-20T15:24:23Z
dc.date.issued2026-04-20
dc.date.submitted2026
dc.descriptionThesis (Master's)--University of Washington, 2026
dc.description.abstractHigh-resolution vision–language models achieve strong performance on fine-grained visual reasoning tasks, but their deployment remains costly due to large visual token counts and heavy language backbones. This work investigates how to build small and efficient multimodal models while preserving high-resolution reasoning ability. We propose a training-free unified compression pipeline that reduces inefficiency at both the token and parameter levels. At the token level, we introduce HiRED–Merge, which combines attention-guided token budgeting with neighbor-aware norm proportional token merging. The method merges only spatially adjacent tokens that survive attention-based selection, helping preserve local structure and reduce information loss from aggressive token dropping. At the parameter level, we apply GLU-aware structured MLP pruning to the language backbone, removing coupled neuron pairs while maintaining dense computation and model structure. A 20% pruning reduces a 7B model to approximately 6B parameters. Experiments on ScienceQA, TextVQA, DocVQA, ChartQA, and MME show that our pipeline improves throughput, memory efficiency, and scalability while maintaining competitive accuracy. These results enable practical deployment of high-resolution vision–language models under limited computational resources.
dc.embargo.lift2028-04-09T15:24:23Z
dc.embargo.termsRestrict to UW for 2 years -- then make Open Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherSultana_washington_0250O_29249.pdf
dc.identifier.urihttps://hdl.handle.net/1773/55426
dc.language.isoen_US
dc.rightsnone
dc.subjectEfficient Multimodal Inference
dc.subjectHigh-Resolution VLMs
dc.subjectModel Compression
dc.subjectStructured Pruning
dc.subjectToken Merging
dc.subjectToken Pruning
dc.subjectComputer science
dc.subjectArtificial intelligence
dc.subjectEngineering
dc.subject.otherTo Be Assigned
dc.titleA Unified Token and Parameter Compression Pipeline for High-Resolution Vision–Language Models
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Sultana_washington_0250O_29249.pdf
Size:
8.15 MB
Format:
Adobe Portable Document Format

Collections