Hidden Capabilities and Counterintuitive Limits in Large Language Models

West, Peter

Hidden Capabilities and Counterintuitive Limits in Large Language Models

dc.contributor.advisor	Choi, Yejin
dc.contributor.author	West, Peter
dc.date.accessioned	2024-10-16T03:11:53Z
dc.date.available	2024-10-16T03:11:53Z
dc.date.issued	2024-10-16
dc.date.submitted	2024
dc.description	Thesis (Ph.D.)--University of Washington, 2024
dc.description.abstract	As massive language models like GPT-4 dominate NLP and AI, extreme-scale has become a clear and frequent theme for success. My research envisions a world where alternative approaches, efficient methods working on small to medium-scale models, work alongside extreme-scale models at the forefront of AI. In pursuit of this goal, the work described in this dissertation develops learning and inference algorithms that unlock hidden capabilities in compact language models. In parallel, I describe the underlying nature of model capabilities, and the limits that even scale-driven frontier models continue to suffer from. Concretely, this dissertation will explore three interconnected threads. First, Decoding-time Algorithms for Unlocking Out-of-the-box Capabilities. I have worked to develop a suite of inference-time algorithms that unlock capabilities in off-the-shelf, compact language models without requiring supervised fine-tuning or extreme scale. In this dissertation, I will describe one such algorithm. Next, Symbolic Knowledge Distillation for Compact Expert Models. I study the way that useful knowledge can be extracted from general LMs, and incorporated into efficient expert models. Towards this goal, I introduce Symbolic Knowledge Distillation, a framework for distilling domain/task-specific knowledge from frontier LMs, and into smaller but often better models for the given domain or task. Finally, Limits of LMs. I investigate the limits of LMs that even extreme scale has yet to overcome, and the ways that model capabilities diverge from the abilities and expectations of humans. In this dissertation, I pose the Generative AI Paradox: despite impressive generation capabilities, strong LMs and other generative models can exhibit much weaker understanding performance than we would expect from a human with the same ability to generate.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	West_washington_0250E_27386.pdf
dc.identifier.uri	https://hdl.handle.net/1773/52458
dc.language.iso	en_US
dc.rights	CC BY
dc.subject	Artificial intelligence
dc.subject	Computer science
dc.subject.other	Computer science and engineering
dc.title	Hidden Capabilities and Counterintuitive Limits in Large Language Models
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: West_washington_0250E_27386.pdf
Size:: 9.93 MB
Format:: Adobe Portable Document Format

Download

Collections

Computer science and engineering