言語から画像を生成する深層学習モデルの挙動に関する考察

藤山 千紘; 小林 一郎

doi:10.11517/pjsai.JSAI2019.0_2L1J902

Abstract

In this study, we analyze the behavior of the computational mechanism and the structure of the feature representation space in a deep neural text-to-image generative model. This is a fundamental approach with a goal to construct artificial general intelligence reflecting the mechanism of human intelligence. First, we explore whether the model is capable of encoding captions and of generating valid images under the circumstance given input captions without word boundaries. Qualitative and quantitative evaluations demonstrate that it can generate compelling images, but the computational mechanism does not acquire the units of meaning. Secondly, we analyze the semantic compositionality in the embedding space. Our experimental result suggests that the semantic compositionality appears between words indicating positions.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!