SuctionPrompt: Visual-Assisted Robotic Picking with a Suction Cup Using Vision-Language Models and Facile Hardware Design

Tomohiro Motoda; Takahide Kitamura; Ryo Hanai; Yukiyasu Domae

doi:10.20965/jrm.2025.p0374

抄録

The development of large language models and vision-language models (VLMs) has resulted in the increasing use of robotic systems in various fields. However, the effective integration of these models into real-world robotic tasks is a key challenge. We developed a versatile robotic system called SuctionPrompt that utilizes prompting techniques of VLMs combined with 3D detections to perform product-picking tasks in diverse and dynamic environments. Our method highlights the importance of integrating 3D spatial information with adaptive action planning to enable robots to approach and manipulate objects in novel environments. In the validation experiments, the system accurately selected suction points 75.4%, and achieved a 65.0% success rate in picking common items. This study highlights the effectiveness of VLMs in robotic manipulation tasks, even with simple 3D processing.

著者関連情報

この記事は最新の被引用情報を取得できません。

This article is licensed under a Creative Commons [Attribution-NoDerivatives 4.0 International] license (https://creativecommons.org/licenses/by-nd/4.0/).
The journal is fully Open Access under Creative Commons licenses and all articles are free to access at JRM official website.
https://www.fujipress.jp/jrobomech/rb-about/#https://creativecommons.org/licenses/by-nd

お気に入り & アラート

閲覧履歴

創刊号からの全論文のPDFは
JRM 公式サイトで公開中(無料)
doiリンクをクリック！

責任著者(Corresponding author)

ファンド情報

1.助成機関/事業名: National Institute of Advanced Industrial Science and Technology

J-STAGEへの登録はこちら（無料）