Broadcasting Support Relations Recursively from Local Dynamics for Object Retrieval in Clutters

1CFCS, School of CS, PKU 2Weiyang College, THU 3University of Oxford 4National Key Laboratory for Multimedia Information Processing, School of CS, PKU
Interpolate start reference image.

Abstract

In our daily life, cluttered objects are everywhere, from scattered stationery and books cluttering the table to bowls and plates filling the kitchen sink. Retrieving a target object from clutters is an essential while challenging skill for robots, for the difficulty of safely manipulating an object without disturbing others, which requires the robot to plan a manipulation sequence and first move away a few other objects supported by the target object step by step. However, due to the diversity of object configurations (e.g., categories, geometries, locations and poses) and their combinations in clutters, it is difficult for a robot to accurately infer the support relations between objects faraway with various objects in between. In this paper, we study retrieving objects in complicated clutters via a novel method of recursively broadcasting the accurate local dynamics to build a support relation graph of the whole scene, which largely reduces the complexity of the support relation inference and improves the accuracy. Experiments in both simulation and the real world demonstrate the efficiency and effectiveness of our method.

Interpolate start reference image.

Directly modelling the support relations between any two objects in clutters is difficult and inaccurate, as object relations between two distant objects could be highly complicated and hard to predict. To tackle this problem, we build the whole support relation graph of the cluttered objects by broadcasting the more accurate local dynamics between adjacent objects recursively , with the assistance of Retrieval Direction Predictor and Local Dynamics Predictor.

Interpolate start reference image.

(Left) When the occlusions are removed, more detailed information will be revealed. To eliminate the impact of occlusions, we propose the Graph Adjustment process. (Right) We propose Affordance as grasping guidance. The estimation of affordance scores for various points necessitates a comprehensive consideration of both the object's geometry and its surrounding environment.

This video demonstrate the process of Recursive Broadcast, which start from the final target and search along the potential supported chain until objects on the top. In this way, we overcome the difficulty of long horizon dependency in physical detection and could predict accurate relationship.

Some examples of our real world experiments

A detailed introduction video of our project

BibTeX

@misc{li2024broadcastingsupportrelationsrecursively,
        title={Broadcasting Support Relations Recursively from Local Dynamics for Object Retrieval in Clutters}, 
        author={Yitong Li and Ruihai Wu and Haoran Lu and Chuanruo Ning and Yan Shen and Guanqi Zhan and Hao Dong},
        year={2024},
        eprint={2406.02283},
        archivePrefix={arXiv},
        primaryClass={cs.RO},
        url={https://arxiv.org/abs/2406.02283}, 
  }
}