Building the Next Generation of Multimodal Models

dc.contributor.advisorHajishirzi, Hannaneh
dc.contributor.advisorFarhadi, Ali
dc.contributor.authorIlharco, Gabriel
dc.date.accessioned2024-04-26T23:19:25Z
dc.date.available2024-04-26T23:19:25Z
dc.date.issued2024-04-26
dc.date.submitted2024
dc.descriptionThesis (Ph.D.)--University of Washington, 2024
dc.description.abstractOne of the fundamental goals of machine learning is to create systems capable of processing data from a variety of modalities such as images and text. I argue that the next generation of multimodal models will be enabled by a deeper understanding of how to design pretraining datasets, and by techniques that offer better control over models after pretraining. Towards the first goal, I introduce a fully open-source benchmark for designing multimodal datasets. This benchmark provides a shared experimental setting for research on dataset curation, allowing researchers to conduct rigorous and controlled experiments. Our experiments highlight the potential of rigorous empirical work on dataset curation, finding pretraining datasets that outperform existing datasets by a large margin. Towards the second goal, I present multiple techniques for improving models after pretraining. Our fine-tuning techniques improve accuracy without overspecialization and without increasing inference costs. Moreover, I present a modular framework for steering the behavior of trained models, designed to efficiently add or delete capabilities while operating directly within the models’ weight space. Altogether, these new techniques pave the way for the next generation of multimodal models.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherIlharco_washington_0250E_26517.pdf
dc.identifier.urihttp://hdl.handle.net/1773/51333
dc.language.isoen_US
dc.rightsCC BY
dc.subject
dc.subjectArtificial intelligence
dc.subject.otherComputer science and engineering
dc.titleBuilding the Next Generation of Multimodal Models
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ilharco_washington_0250E_26517.pdf
Size:
3.3 MB
Format:
Adobe Portable Document Format