Dynamic Object Understanding and Enhanced Grasp Detection: A Dual-Method Approach

Atar, Soofiyan Layakalli

Dynamic Object Understanding and Enhanced Grasp Detection: A Dual-Method Approach

Files

Atar_washington_0250O_26797.pdf (8.6 MB)

Date

2024-09-09

Authors

Atar, Soofiyan Layakalli

Abstract

In this thesis, we present a comprehensive approach to advancing suction grasp point detection through several innovative methods. Initially, we introduced DYNAMO-GRASP, a novel technique leveraging the strengths of physics-based simulation and data-driven modeling to account for object dynamics during the grasping process. This method significantly enhances a robot's ability to handle previously unseen objects and scenarios in real-world settings, achieving a remarkable success rate improvement of up to 48\% over state-of-the-art (SOTA) methods in challenging real-world tests. Building on this foundation, we elevated DYNAMO-GRASP by integrating Google-scanned objects with RGB channels, which further increased accuracy by 30\%. We also explored Visual Language Model (VLM) methods, but found that they underperformed compared to the enhanced DYNAMO-GRASP RGB version, as they sometimes missed the suction grasp despite extensive prompt engineering efforts. Subsequently, we investigated zero-shot transfer using the ChatGPT VLM model. The culmination of our research is the development of a hybrid model combining Dino V2 and DPT models. In this model, Dino V2 serves as the encoder and DPT as the decoder, with a complex head predicting the affordance map for grasp point extraction. This method has demonstrated the highest performance to date, doubling the accuracy of previous approaches. Additionally, it outputs roll and pitch affordance maps, which are used to determine the optimal grasping angles. This advanced model, validated using simulated data and transferred to real-world applications, marks a significant milestone in robust and resilient robotic manipulation in intricate real-world situations.