A Unified Token and Parameter Compression Pipeline for High-Resolution Vision–Language Models
| dc.contributor.advisor | Ali, Mohamed | |
| dc.contributor.author | Sultana, Tasnia | |
| dc.date.accessioned | 2026-04-20T15:24:23Z | |
| dc.date.issued | 2026-04-20 | |
| dc.date.submitted | 2026 | |
| dc.description | Thesis (Master's)--University of Washington, 2026 | |
| dc.description.abstract | High-resolution vision–language models achieve strong performance on fine-grained visual reasoning tasks, but their deployment remains costly due to large visual token counts and heavy language backbones. This work investigates how to build small and efficient multimodal models while preserving high-resolution reasoning ability. We propose a training-free unified compression pipeline that reduces inefficiency at both the token and parameter levels. At the token level, we introduce HiRED–Merge, which combines attention-guided token budgeting with neighbor-aware norm proportional token merging. The method merges only spatially adjacent tokens that survive attention-based selection, helping preserve local structure and reduce information loss from aggressive token dropping. At the parameter level, we apply GLU-aware structured MLP pruning to the language backbone, removing coupled neuron pairs while maintaining dense computation and model structure. A 20% pruning reduces a 7B model to approximately 6B parameters. Experiments on ScienceQA, TextVQA, DocVQA, ChartQA, and MME show that our pipeline improves throughput, memory efficiency, and scalability while maintaining competitive accuracy. These results enable practical deployment of high-resolution vision–language models under limited computational resources. | |
| dc.embargo.lift | 2028-04-09T15:24:23Z | |
| dc.embargo.terms | Restrict to UW for 2 years -- then make Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Sultana_washington_0250O_29249.pdf | |
| dc.identifier.uri | https://hdl.handle.net/1773/55426 | |
| dc.language.iso | en_US | |
| dc.rights | none | |
| dc.subject | Efficient Multimodal Inference | |
| dc.subject | High-Resolution VLMs | |
| dc.subject | Model Compression | |
| dc.subject | Structured Pruning | |
| dc.subject | Token Merging | |
| dc.subject | Token Pruning | |
| dc.subject | Computer science | |
| dc.subject | Artificial intelligence | |
| dc.subject | Engineering | |
| dc.subject.other | To Be Assigned | |
| dc.title | A Unified Token and Parameter Compression Pipeline for High-Resolution Vision–Language Models | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Sultana_washington_0250O_29249.pdf
- Size:
- 8.15 MB
- Format:
- Adobe Portable Document Format
