電気学会論文誌C(電子・情報・システム部門誌)
Online ISSN : 1348-8155
Print ISSN : 0385-4221
ISSN-L : 0385-4221
<ソフトコンピューティング・学習>
マルチモーダル深層学習に基づく道路構造・出現物体・操作で構成される運転場面の検出
橋本 幸二郎柳原 大地
著者情報
ジャーナル 認証あり

2024 年 144 巻 10 号 p. 985-996

詳細
抄録

In this paper, we propose a method to detect driving scenes where cognitive function can be evaluated. This method defines assessable scenes as those composed of three elements: road structure, appearing objects, and operations. It detects scenes composed of these three elements, which are arbitrarily set. When detecting targets composed of multiple information sources, and for targets where the pre-description of useful feature vectors is difficult, multimodal deep learning is used. While there are cases where an intermediate fusion model structure is used in existing research, it has been suggested that such models face challenges with hyperparameter tuning and may fail to learn the inter-modality relationships when there are discrepancies in the amount of information each modality provides. Therefore, in this paper, a new model structure that incorporates an attention mechanism into a late fusion model is proposed. This model not only enables individual evaluation of each modality constituting the scenes and achieves the final detection result, but also provides a structure with high readability regarding how the detection results are produced. In experiments, this method is compared in terms of detection accuracy with the intermediate fusion model structure used in existing research, and improvements in both recall and precision were confirmed.

著者関連情報
© 2024 電気学会
前の記事 次の記事
feedback
Top