2019 年 85 巻 12 号 p. 1151-1156
The purpose of this research is to develop vision-driven image captioning technology capable of creating not just simple situation description but refined expressions such as jokes. The proposed method consists of three phases; collection, joke, and assessment. In “collection” phase, we collect 270,000 images and 5,000,000 funny jokes (captions) from Japanese joke website “Bokete” and build a joke database named BoketeDB. In “joke” phase, we adopt a step function to change weight corresponding to evaluation of captions in the BoketeDB. The function outputs Funny Score representing evaluation. We utilize the Funny Score to tune parameters for training conventional CNN-LSTM model. In “assessment” phase, we prepare two kinds of evaluation ways for machine-generated jokes. One is to collect questionnaires regarding the generated jokes from the unspecified 121 subjects. The other is to directly post the generated jokes to website Bokete. The evaluation is performed by people browsing the website. We have verified that the proposed method is superior to conventional image captioning method to deal with jokes.