Garment manipulation (e.g., unfolding, folding and hanging clothes) is essential for future robots to accomplish home-assistant tasks, while highly challenging due to the diversity of garment configurations, geometries and deformations. Although able to manipulate similar shaped garments in a certain task, previous works mostly have to design different policies for different tasks, could not generalize to garments with diverse geometries, and often rely heavily on human-annotated data. In this paper, we leverage the property that, garments in a certain category have similar structures, and then learn the topological dense (point-level) visual correspondence among garments in the category level with different deformations in the selfsupervised manner. The topological correspondence can be easily adapted to the functional correspondence to guide the manipulation policies for various downstream tasks, within only one or few-shot demonstrations. Experiments over garments in 3 different categories on 3 representative tasks in diverse scenarios, using one or two arms, taking one or more steps, inputting flat or messy garments, demonstrate the effectiveness of our proposed method.
There exist various types of garments with diverse styles, shapes and deformations. Unified representation and policy are required for manipulation with generalization capability. |
We propose to use skeletons, a graph of keypoints and edges to represent the structural information shared among garments in the category level. While previous works only propose skeletons in rigid objects, We first predict garment skeletons in flat states and project the skeletons to different garment deformations. |
Our Proposed Learning Framework for Dense Visual Correspondence. (Left) We extract the cross-deform correspondence and cross-object correspondence point pairs respectively using self-play and skeletons, and train the per-point correspondence scores in the contrastive manner, with the Coarse-to-fine module refines the quality. (Middle) Learned correspondence demonstrates point-level similarity across different garments in different deformations. (Right) The learned point-level correspondence can facilitates multiple diverse downstream tasks using one or few-shot demonstrations. |
With one (or a few) demonstration garment(s), and the demonstration manipulation policies for different tasks, for novel unseen garments, we can use the learned dense correspondence to select the manipulation points, and thus generate the manipulation policies to accomplish different tasks. |
Correspondence Guided Manipulation on Different Garment Types and Tasks. From left to right: observation, correspondence, manipulation points (colored points) selected using correspondence to demonstrations and the manipulation action. |
Our Real-world Setup includes two Franka Panda robot arms and a kinect camera. We first conduct the alignment on the real world point cloud and that in simulation, and then use correspondence to select manipulation points and thus guide manipulation. |
If you have any questions, please feel free to contact Ruihai Wu at wuruihai_at_pku_edu_cn and Haoran Lu at 2100012904_at_stu_pku_edu_cn. |
@InProceedings{Wu_2024_CVPR, author = {Ruihai Wu and Haoran Lu and Yiyan Wang and Yubo Wang and Hao Dong}, title = {UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024} }