Approaches for Interactions in Robotics Applications

dc.contributor.advisorFarhadi, Ali
dc.contributor.advisorFox, Dieter
dc.contributor.authorRoh, Junha
dc.date.accessioned2022-09-23T20:44:32Z
dc.date.issued2022-09-23
dc.date.submitted2022
dc.descriptionThesis (Ph.D.)--University of Washington, 2022
dc.description.abstractAs increasing numbers of robots for helping humans are developed and deployed, it is important to have effective interaction with humans and other agents. Robots cooperating with humans are asked to be robust for safety and often desired to be explainable. On top of the challenges that general robots have, communication with humans needs the ability to understand the human direction, conditioning robots' behavior as well as their internal states or observations. Additionally, training a model for robotics applications requires costly environments for collecting data or active simulations to generate data while recent approaches with large models collect the data from the web. It makes the training process expensive, especially for the models with the capability of using language in specific contexts. In this thesis, we propose various methods that produce interpretable results using composable sub-models for interactions in robotics applications. In the first part of the thesis, we propose models for interaction in driving. We develop a model that enables humans to control a vehicle with language instructions such as "turn left and then turn right.” The model consists of two sub-models: a high-level policy to translate the language instruction to a sequence of sub-tasks and a low-level policy to control the vehicle to accomplish each sub-task. We also propose a model that predicts future trajectories of agents on a four-way intersection where it tackles another important form of interaction for autonomous vehicles. The first sub-model predicts destinations and topologically invariant description of the order of executions from reference trajectories. Given the abstract description of the scene, the second sub-model predicts multiple future trajectories. In the second part, we propose visual grounding models in 3D pointcloud and RGBD images as essential tasks for robot navigation and human-robot interaction. The task is to identify the referred object given language description either from a reconstructed 3D scene or a pair of RGBD images. The model for 3D visual grounding extends a large language model to a spatial-language model for identifying the target object. The model for RGBD visual grounding combines a pre-trained 2D visual grounding model and a 3D bounding-box proposal model. They can leverage the high generalization performance of large models, achieve comparable numbers to state-of-the-art methods, and produce interpretable intermediate results.
dc.embargo.lift2023-09-23T20:44:32Z
dc.embargo.termsRestrict to UW for 1 year -- then make Open Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherRoh_washington_0250E_24790.pdf
dc.identifier.urihttp://hdl.handle.net/1773/49316
dc.language.isoen_US
dc.rightsCC BY-NC-ND
dc.subject
dc.subjectComputer science
dc.subjectRobotics
dc.subject.otherComputer science and engineering
dc.titleApproaches for Interactions in Robotics Applications
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Roh_washington_0250E_24790.pdf
Size:
53.67 MB
Format:
Adobe Portable Document Format