アテンション機構を用いたクロップとマスクによるキャプション生成のためのデータ拡張

岩村 紀与彦; ルイ笠原 純ユネス; モロ アレッサンドロ; 山下 淳; 淺間 一

doi:10.2493/jjspe.86.904

Abstract

Automatic image captioning has various important applications such as the depiction of contents for the visually impaired. Most approaches use Deep Learning and have achieved remarkable results. However there are still some unresolved issues. One of them is the overfitting of the trained model to specific images, usually caused by limited training dataset sizes. In order to augment the training dataset size in such scenarios, previous researches proposed data augmentation using random cropping or mask. However, those do not specifically target overfitted regions in images and, therefore, may remove areas in images that are needed to generate captions and lower performance. In this study, we propose a novel data augmentation method that targets specifically regions in images subject to overfitting by using attention. Experimental results show that the proposed method allows generation of better image captions.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!