Heterogeneous Model Collaborations in Language Models
| dc.contributor.advisor | Tsvetkov, Yulia | |
| dc.contributor.author | Han, Xiaochuang | |
| dc.date.accessioned | 2025-05-12T22:46:30Z | |
| dc.date.available | 2025-05-12T22:46:30Z | |
| dc.date.issued | 2025-05-12 | |
| dc.date.submitted | 2025 | |
| dc.description | Thesis (Ph.D.)--University of Washington, 2025 | |
| dc.description.abstract | Language is central to human intelligence and has similarly become integral to artificial intelligence, particularly through advancements in natural language processing and large language models. Modern LLMs focus on model scaling, using one generic and monolithic structure and substantially increasing model sizes. This is a highly homogeneous process---one single type of neural modules is used towards all tasks with the same optimization objective. While scaling LLMs has achieved remarkable successes, it remains challenging for achieving swift control over LLM outputs, reducing hallucinations, enabling fundamental weak-to-strong generalization, and adapting seamlessly to multi-modalities beyond language. In this thesis, I aim to develop novel heterogeneous model collaboration paradigms that address each of the LLM challenges above. In the first part, I introduce a diffusion-based generative LLM that collaborates with discriminative text classifiers for a nuanced controllability at inference time. In the second part, I present a sampling algorithm through a collaboration between two contrastively contextualized language models, reducing hallucinations and enhancing faithfulness to input context. In the third part, I introduce two diffusion LLMs, a large general-purpose model and a much smaller specialized model, collaborating together for strong ensemble performance and robust weak-to-strong generalization compared to autoregressive approaches. Finally, I present an autoregressive LLM architecture learning to generate images and videos through collaboration with non-neural, multimodal compression codecs, overcoming modality barriers and expanding the multimodal generative capabilities of LLMs. Overall, this thesis aims to advance NLP and LLMs beyond monolithic model scaling through modular, heterogeneous model collaborations across language and multimodal domains, ultimately contributing to the pursuit of general machine intelligence. | |
| dc.embargo.terms | Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Han_washington_0250E_27974.pdf | |
| dc.identifier.uri | https://hdl.handle.net/1773/52960 | |
| dc.language.iso | en_US | |
| dc.rights | CC BY-NC-SA | |
| dc.subject | Computer science | |
| dc.subject | Artificial intelligence | |
| dc.subject.other | Computer science and engineering | |
| dc.title | Heterogeneous Model Collaborations in Language Models | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Han_washington_0250E_27974.pdf
- Size:
- 8.02 MB
- Format:
- Adobe Portable Document Format
