Journal of the Japan Society for Precision Engineering
Online ISSN : 1882-675X
Print ISSN : 0912-0289
ISSN-L : 0912-0289
Paper
Image Captioning with Data Augmentation Using Cropping and Mask Based on Attention Image
Kiyohiko IWAMURAJun Younes LOUHI KASAHARAAlessandro MOROAtsushi YAMASHITAHajime ASAMA
Author information
JOURNAL FREE ACCESS

2020 Volume 86 Issue 11 Pages 904-910

Details
Abstract

Automatic image captioning has various important applications such as the depiction of contents for the visually impaired. Most approaches use Deep Learning and have achieved remarkable results. However there are still some unresolved issues. One of them is the overfitting of the trained model to specific images, usually caused by limited training dataset sizes. In order to augment the training dataset size in such scenarios, previous researches proposed data augmentation using random cropping or mask. However, those do not specifically target overfitted regions in images and, therefore, may remove areas in images that are needed to generate captions and lower performance. In this study, we propose a novel data augmentation method that targets specifically regions in images subject to overfitting by using attention. Experimental results show that the proposed method allows generation of better image captions.

Content from these authors
© 2020 The Japan Society for Precision Engineering
Previous article Next article
feedback
Top