アニメ風人物画像変換モデルにおける顔のランドマーク情報の導入

山本 悠人; 吉岡 理文; 井上 勝文

doi:10.1541/ieejeiss.145.1012

Abstract

Unpaired image-to-image translation is a task that is expected to be applied in scientific simulations and other applications. In particular, the conversion of selfie photos to animated images is expected to be applied in the production of animation, manga, games, etc. CycleGAN was the first model to achieve unpaired image-to-image translation tasks, and many methods have been proposed to apply it since then. However, these methods often had difficulty with geometric transformations, and it was difficult to achieve selfie photo to animation translations involving large geometric transformations for the eyes and nose. In this study, we focus on a CycleGAN-based method that can perform the transformation with an intuitive and simple architecture, and introduce a mechanism to assist geometric transformation into UVCGAN, which introduces the Vision Transformer to CycleGAN. The goal is to improve the performance of the conversion from selfie photos to animated images by generating natural-looking images. For this purpose, landmark images of the original and reconstructed images are obtained when calculating the cycle-consistency loss, and the difference between them is also accounted for in the loss function. This method produces more natural-looking images than the conventional method, as the eye sizes and facial contours are now aligned.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!