Learning Structured Representations of the Visual World

dc.contributor.advisorFarhadi, Ali
dc.contributor.authorWallingford, Matthew
dc.date.accessioned2025-10-02T16:07:30Z
dc.date.available2025-10-02T16:07:30Z
dc.date.issued2025-10-02
dc.date.submitted2025
dc.descriptionThesis (Ph.D.)--University of Washington, 2025
dc.description.abstractHumans develop complex internal models of the world which allow us to generalize remarkably well to new scenarios and tasks. While deep learning has steadily improved in performance through data and scale, it conspicuously lags behind in its generalization to changing data distributions and transfer across tasks when compared to biological intelligence. We argue that one key element absent from current deep learning systems is this internal model of the world to enable efficient transfer of knowledge to new settings and data. In this work, we investigate how aspects of world models such as compositionality and 3D spatial understanding can be learned from visual data and be used to improve the efficiency and robustness of current machine learning systems. We develop new methods and loss objectives for learning structured representations. We demonstrate how learning from more complex visual data such as video, embodied exploration, and 360° video enables learning more structured world models which improves sample efficiency and spatial understanding. In addition, we explore other directions and develop methods to improve the transfer of knowledge between tasks and robustness to shifting data distributions.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherWallingford_washington_0250E_28577.pdf
dc.identifier.urihttps://hdl.handle.net/1773/53977
dc.language.isoen_US
dc.rightsCC BY
dc.subjectArtificial Intelligence
dc.subjectComputer Vision
dc.subjectMachine Learning
dc.subjectRepresentation Learning
dc.subjectTransfer Learning
dc.subjectArtificial intelligence
dc.subjectComputer science
dc.subject.otherComputer science and engineering
dc.titleLearning Structured Representations of the Visual World
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Wallingford_washington_0250E_28577.pdf
Size:
28.36 MB
Format:
Adobe Portable Document Format