Approaches for Interactions in Robotics Applications

Roh, Junha

Approaches for Interactions in Robotics Applications

dc.contributor.advisor	Farhadi, Ali
dc.contributor.advisor	Fox, Dieter
dc.contributor.author	Roh, Junha
dc.date.accessioned	2022-09-23T20:44:32Z
dc.date.issued	2022-09-23
dc.date.submitted	2022
dc.description	Thesis (Ph.D.)--University of Washington, 2022
dc.description.abstract	As increasing numbers of robots for helping humans are developed and deployed, it is important to have effective interaction with humans and other agents. Robots cooperating with humans are asked to be robust for safety and often desired to be explainable. On top of the challenges that general robots have, communication with humans needs the ability to understand the human direction, conditioning robots' behavior as well as their internal states or observations. Additionally, training a model for robotics applications requires costly environments for collecting data or active simulations to generate data while recent approaches with large models collect the data from the web. It makes the training process expensive, especially for the models with the capability of using language in specific contexts. In this thesis, we propose various methods that produce interpretable results using composable sub-models for interactions in robotics applications. In the first part of the thesis, we propose models for interaction in driving. We develop a model that enables humans to control a vehicle with language instructions such as "turn left and then turn right.” The model consists of two sub-models: a high-level policy to translate the language instruction to a sequence of sub-tasks and a low-level policy to control the vehicle to accomplish each sub-task. We also propose a model that predicts future trajectories of agents on a four-way intersection where it tackles another important form of interaction for autonomous vehicles. The first sub-model predicts destinations and topologically invariant description of the order of executions from reference trajectories. Given the abstract description of the scene, the second sub-model predicts multiple future trajectories. In the second part, we propose visual grounding models in 3D pointcloud and RGBD images as essential tasks for robot navigation and human-robot interaction. The task is to identify the referred object given language description either from a reconstructed 3D scene or a pair of RGBD images. The model for 3D visual grounding extends a large language model to a spatial-language model for identifying the target object. The model for RGBD visual grounding combines a pre-trained 2D visual grounding model and a 3D bounding-box proposal model. They can leverage the high generalization performance of large models, achieve comparable numbers to state-of-the-art methods, and produce interpretable intermediate results.
dc.embargo.lift	2023-09-23T20:44:32Z
dc.embargo.terms	Restrict to UW for 1 year -- then make Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Roh_washington_0250E_24790.pdf
dc.identifier.uri	http://hdl.handle.net/1773/49316
dc.language.iso	en_US
dc.rights	CC BY-NC-ND
dc.subject
dc.subject	Computer science
dc.subject	Robotics
dc.subject.other	Computer science and engineering
dc.title	Approaches for Interactions in Robotics Applications
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Roh_washington_0250E_24790.pdf
Size:: 53.67 MB
Format:: Adobe Portable Document Format

Download

Collections

Computer science and engineering