Hidden Capabilities and Counterintuitive Limits in Large Language Models

dc.contributor.advisorChoi, Yejin
dc.contributor.authorWest, Peter
dc.date.accessioned2024-10-16T03:11:53Z
dc.date.available2024-10-16T03:11:53Z
dc.date.issued2024-10-16
dc.date.submitted2024
dc.descriptionThesis (Ph.D.)--University of Washington, 2024
dc.description.abstractAs massive language models like GPT-4 dominate NLP and AI, extreme-scale has become a clear and frequent theme for success. My research envisions a world where alternative approaches, efficient methods working on small to medium-scale models, work alongside extreme-scale models at the forefront of AI. In pursuit of this goal, the work described in this dissertation develops learning and inference algorithms that unlock hidden capabilities in compact language models. In parallel, I describe the underlying nature of model capabilities, and the limits that even scale-driven frontier models continue to suffer from. Concretely, this dissertation will explore three interconnected threads. First, Decoding-time Algorithms for Unlocking Out-of-the-box Capabilities. I have worked to develop a suite of inference-time algorithms that unlock capabilities in off-the-shelf, compact language models without requiring supervised fine-tuning or extreme scale. In this dissertation, I will describe one such algorithm. Next, Symbolic Knowledge Distillation for Compact Expert Models. I study the way that useful knowledge can be extracted from general LMs, and incorporated into efficient expert models. Towards this goal, I introduce Symbolic Knowledge Distillation, a framework for distilling domain/task-specific knowledge from frontier LMs, and into smaller but often better models for the given domain or task. Finally, Limits of LMs. I investigate the limits of LMs that even extreme scale has yet to overcome, and the ways that model capabilities diverge from the abilities and expectations of humans. In this dissertation, I pose the Generative AI Paradox: despite impressive generation capabilities, strong LMs and other generative models can exhibit much weaker understanding performance than we would expect from a human with the same ability to generate.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherWest_washington_0250E_27386.pdf
dc.identifier.urihttps://hdl.handle.net/1773/52458
dc.language.isoen_US
dc.rightsCC BY
dc.subjectArtificial intelligence
dc.subjectComputer science
dc.subject.otherComputer science and engineering
dc.titleHidden Capabilities and Counterintuitive Limits in Large Language Models
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
West_washington_0250E_27386.pdf
Size:
9.93 MB
Format:
Adobe Portable Document Format