Acting with Language

Loading...
Thumbnail Image

Authors

Shridhar, Mohit

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

How can we imbue robots with the ability to achieve arbitrary goals in novel environments? Language provides a natural interface for guiding robots and abstracting the complexities of the physical world. Previous attempts to guide robots with language often rely on human-designed intermediate representations, such as object detections, categories, poses, and symbolic states. These representations struggle to represent everyday objects, such as deformable shirts, coffee beans, ropes, and cherry stems. One solution that does not require human-designed representations is end-to-end deep learning, which directly maps camera observations to robot actions. While learning approaches are vastly more expressive than traditional methods, they are severely bottlenecked by the lack of training data in robotics. Training a simple policy could take months of data collection and is not scalable. However, robot data includes spatial symmetries and other structural priors that can be utilized to efficiently learn policies for a wide range of tasks. In this thesis, we present various methods for using language to guide robot actions through end-to-end learning. First, we present ALFRED, a large-scale dataset and benchmark for evaluating agents that follow language instructions in partially-observable household environments. Next, we introduce CLIPort and PerAct, two language-conditioned manipulation frameworks that aim to replicate the success of pre-training large models from vision and language in robotics. These frameworks use spatial priors to efficiently learn action representations from limited data. Lastly, we discuss ALFWorld, a framework for learning “textual policies” in interactive text games, thereby avoiding the visual and physical complexities of interacting with embodied environments. We conclude with a discussion on counterpoints, limitations, and potential future directions for scaling-up robot-learning and butler robots.

Description

Thesis (Ph.D.)--University of Washington, 2023

Citation

DOI