Heterogeneous Model Collaborations in Language Models

Han, Xiaochuang

Heterogeneous Model Collaborations in Language Models

dc.contributor.advisor	Tsvetkov, Yulia
dc.contributor.author	Han, Xiaochuang
dc.date.accessioned	2025-05-12T22:46:30Z
dc.date.available	2025-05-12T22:46:30Z
dc.date.issued	2025-05-12
dc.date.submitted	2025
dc.description	Thesis (Ph.D.)--University of Washington, 2025
dc.description.abstract	Language is central to human intelligence and has similarly become integral to artificial intelligence, particularly through advancements in natural language processing and large language models. Modern LLMs focus on model scaling, using one generic and monolithic structure and substantially increasing model sizes. This is a highly homogeneous process---one single type of neural modules is used towards all tasks with the same optimization objective. While scaling LLMs has achieved remarkable successes, it remains challenging for achieving swift control over LLM outputs, reducing hallucinations, enabling fundamental weak-to-strong generalization, and adapting seamlessly to multi-modalities beyond language. In this thesis, I aim to develop novel heterogeneous model collaboration paradigms that address each of the LLM challenges above. In the first part, I introduce a diffusion-based generative LLM that collaborates with discriminative text classifiers for a nuanced controllability at inference time. In the second part, I present a sampling algorithm through a collaboration between two contrastively contextualized language models, reducing hallucinations and enhancing faithfulness to input context. In the third part, I introduce two diffusion LLMs, a large general-purpose model and a much smaller specialized model, collaborating together for strong ensemble performance and robust weak-to-strong generalization compared to autoregressive approaches. Finally, I present an autoregressive LLM architecture learning to generate images and videos through collaboration with non-neural, multimodal compression codecs, overcoming modality barriers and expanding the multimodal generative capabilities of LLMs. Overall, this thesis aims to advance NLP and LLMs beyond monolithic model scaling through modular, heterogeneous model collaborations across language and multimodal domains, ultimately contributing to the pursuit of general machine intelligence.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Han_washington_0250E_27974.pdf
dc.identifier.uri	https://hdl.handle.net/1773/52960
dc.language.iso	en_US
dc.rights	CC BY-NC-SA
dc.subject	Computer science
dc.subject	Artificial intelligence
dc.subject.other	Computer science and engineering
dc.title	Heterogeneous Model Collaborations in Language Models
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Han_washington_0250E_27974.pdf
Size:: 8.02 MB
Format:: Adobe Portable Document Format

Download

Collections

Computer science and engineering