深層学習を用いた視聴覚統融合システムに関する研究

津村 和人; 小林 太; 中本 裕之

doi:10.1299/jsmermd.2020.1P2-L03

Abstract

Recently, sensing technology has been dramatically developed. Along with this, a wide variety of sensors have been used in a system such as automated driving technology and the robot technology. When humans recognize the environment, the information of five senses are transmitted to the sensory area in the cerebrum and processed. After that, the processed information are transmitted to the association area and fusion. Also in the robot sensor fusion, it is expected such a human sensor fusion system. In this study, we propose the system that word recognition is fusion by combining Visual Data from a RGB camera and Voice Data from a microphone using CNN that can automatically extract features. We train the network using Visual Data and Voice Data, and verify the accuracy of the system by the word recognition rate, and the possibility of sensor fusion by deep learning.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!