Objects and Actions: Learning Representations for Open-World Robotics

Yuan, Wentao

Objects and Actions: Learning Representations for Open-World Robotics

dc.contributor.advisor	Fox, Dieter
dc.contributor.author	Yuan, Wentao
dc.date.accessioned	2024-10-16T03:12:02Z
dc.date.available	2024-10-16T03:12:02Z
dc.date.issued	2024-10-16
dc.date.submitted	2024
dc.description	Thesis (Ph.D.)--University of Washington, 2024
dc.description.abstract	Advancing robotics involves enabling systems to generalize across diverse and unseen environments, known as "the open world." Traditional approaches rely on state estimators, while modern learning-based methods develop implicit representations to approximate states. Both approaches require well-designed states or representations for effective generalization. This dissertation investigates learning representations that enhance generalization in robotic systems, focusing on objects and actions. First, I introduce SORNet (Spatial Object-Centric Representation Network), a framework for learning object-centric representations from RGB images using canonical object views. SORNet generalizes to unseen objects with different shapes and textures, outperforming existing techniques in tasks like spatial relation classification and task planning for sequential manipulation. Next, I present M2T2, a transformer model that predicts low-level actions for manipulating objects in cluttered scenes. M2T2 reasons about contact points and gripper poses from raw point clouds. Trained on a large-scale synthetic dataset, M2T2 achieves zero-shot sim2real transfer on real robots, surpassing state-of-the-art models in both overall performance and in challenging tasks requiring object re-orientation. Finally, I introduce RoboPoint, a vision-language model that predicts keypoint affordances from language instructions. Using a synthetic data generation pipeline, RoboPoint trains without real-world data collection or human demonstration. It supports applications such as robot navigation, manipulation, and augmented reality, and outperforms existing models in spatial affordance prediction and task success rates. The dissertation concludes with a discussion on challenges and future directions for developing foundational models in robotics, aiming to create versatile systems capable of operating in open-world environments.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Yuan_washington_0250E_27563.pdf
dc.identifier.uri	https://hdl.handle.net/1773/52471
dc.language.iso	en_US
dc.rights	CC BY-SA
dc.subject	Artificial Intelligence
dc.subject	Computer Vision
dc.subject	Foundation Models
dc.subject	Machine Learning
dc.subject	Representation Learning
dc.subject	Robotics
dc.subject	Computer science
dc.subject	Computer engineering
dc.subject.other	Computer science and engineering
dc.title	Objects and Actions: Learning Representations for Open-World Robotics
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Yuan_washington_0250E_27563.pdf
Size:: 27.18 MB
Format:: Adobe Portable Document Format

Download

Collections

Computer science and engineering