日本ロボット学会誌
Online ISSN : 1884-7145
Print ISSN : 0289-1824
ISSN-L : 0289-1824
論文
エコロケーションに基づく視覚シーンの再構成手法の提案と入力特徴量の検討
岸波 華彦糸山 克寿西田 健次中臺 一博
著者情報
ジャーナル フリー

2022 年 40 巻 4 号 p. 351-354

詳細
抄録

This paper addresses reconstruction of visual scenes based on echolocation, aiming to develop auditory scene understanding for robots and systems. Although scene understanding technology with a camera and a LIDAR has been studied well, it is prone to changes in lighting conditions and has difficulty in detecting invisible materials. Ultrasonic sensors are widely used, but their use is limited to distance estimation. There is an unavoidable risk of ultrasonic exposure since most ultrasonic power exists in inaudible frequency ranges. To solve these problems, we propose a framework for echolocation-based scene reconstruction (ELSR). ELSR can reconstruct a visual scene using the transmitted/received audible sound, and it exploits a Generative Adversarial Network (GAN) to learn translation from input sound to a visual scene. As GAN is originally designed for image input, we carefully considered the difference between image and sound input and propose introducing cross-correlation and trigonometric function-based features to input audio features. The proposed framework is implemented based on pix2pix, a kind of conditional GAN, and a dataset for ELSR consisting of 10,800 pairs of input sound and depth images recorded at 28 indoor locations was newly created. Experimental results using the dataset showed the effectiveness of the proposed framework ELSR and audio features.

著者関連情報
© 2018 日本ロボット学会
前の記事 次の記事
feedback
Top