A Unified Token and Parameter Compression Pipeline for High-Resolution Vision–Language Models

Sultana, Tasnia

A Unified Token and Parameter Compression Pipeline for High-Resolution Vision–Language Models

dc.contributor.advisor	Ali, Mohamed
dc.contributor.author	Sultana, Tasnia
dc.date.accessioned	2026-04-20T15:24:23Z
dc.date.issued	2026-04-20
dc.date.submitted	2026
dc.description	Thesis (Master's)--University of Washington, 2026
dc.description.abstract	High-resolution vision–language models achieve strong performance on fine-grained visual reasoning tasks, but their deployment remains costly due to large visual token counts and heavy language backbones. This work investigates how to build small and efficient multimodal models while preserving high-resolution reasoning ability. We propose a training-free unified compression pipeline that reduces inefficiency at both the token and parameter levels. At the token level, we introduce HiRED–Merge, which combines attention-guided token budgeting with neighbor-aware norm proportional token merging. The method merges only spatially adjacent tokens that survive attention-based selection, helping preserve local structure and reduce information loss from aggressive token dropping. At the parameter level, we apply GLU-aware structured MLP pruning to the language backbone, removing coupled neuron pairs while maintaining dense computation and model structure. A 20% pruning reduces a 7B model to approximately 6B parameters. Experiments on ScienceQA, TextVQA, DocVQA, ChartQA, and MME show that our pipeline improves throughput, memory efficiency, and scalability while maintaining competitive accuracy. These results enable practical deployment of high-resolution vision–language models under limited computational resources.
dc.embargo.lift	2028-04-09T15:24:23Z
dc.embargo.terms	Restrict to UW for 2 years -- then make Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Sultana_washington_0250O_29249.pdf
dc.identifier.uri	https://hdl.handle.net/1773/55426
dc.language.iso	en_US
dc.rights	none
dc.subject	Efficient Multimodal Inference
dc.subject	High-Resolution VLMs
dc.subject	Model Compression
dc.subject	Structured Pruning
dc.subject	Token Merging
dc.subject	Token Pruning
dc.subject	Computer science
dc.subject	Artificial intelligence
dc.subject	Engineering
dc.subject.other	To Be Assigned
dc.title	A Unified Token and Parameter Compression Pipeline for High-Resolution Vision–Language Models
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Sultana_washington_0250O_29249.pdf
Size:: 8.15 MB
Format:: Adobe Portable Document Format

Download

Collections

To Be Assigned