IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532

この記事には本公開記事があります。本公開記事を参照してください。
引用する場合も本公開記事を引用してください。

UTStyleCap4K: Generating Image Captions with Sentimental Styles
Chi ZHANGLi TAOToshihiko YAMASAKI
著者情報
ジャーナル フリー 早期公開

論文ID: 2024EDP7036

この記事には本公開記事があります。
詳細
抄録

Stylized image captioning is the task of generating image captions that have a description style, such as positive or negative sentiments. Recently, deep learning models have reached high performance in this task, but they still lack description accuracy and diversity, and they often suffer from the small size and the low descriptiveness of existing datasets. In this paper, we introduce a new dataset, UTStyleCap4K, which contains 4,644 images with three positive and three negative captions for every image (27,864 captions in total), collected by a crowdsourcing service. Experimental results show that our dataset is accurate in meaning and sentiments, diverse in the ways to describe the styles, and less similar to the base dataset, the MSCOCO dataset, than existing stylized image captioning datasets. We train multiple models on our dataset to set a baseline. We also propose a new Bidirectional Encoder Representations from Transformers (BERT) based model, StyleCapBERT, that controls the length and style of the generated captions at the same time, by introducing length and style information into the embeddings of caption words. Experimental results show that our model is capable of generating captions of three sentimental styles, positive, factual, and negative, at the same time, and achieving the best performance on our dataset.

著者関連情報
© 2024 The Institute of Electronics, Information and Communication Engineers
feedback
Top