IEEJ Transactions on Electronics, Information and Systems
Online ISSN : 1348-8155
Print ISSN : 0385-4221
ISSN-L : 0385-4221
<Speech and Image Processing, Recognition>
Introduction of Facial Landmark Information in a Cartoon-like Human Image Translation Model
Yuto YamamotoMichifumi YoshiokaKatsufumi Inoue
Author information
JOURNAL RESTRICTED ACCESS

2025 Volume 145 Issue 11 Pages 1012-1021

Details
Abstract

Unpaired image-to-image translation is a task that is expected to be applied in scientific simulations and other applications. In particular, the conversion of selfie photos to animated images is expected to be applied in the production of animation, manga, games, etc. CycleGAN was the first model to achieve unpaired image-to-image translation tasks, and many methods have been proposed to apply it since then. However, these methods often had difficulty with geometric transformations, and it was difficult to achieve selfie photo to animation translations involving large geometric transformations for the eyes and nose. In this study, we focus on a CycleGAN-based method that can perform the transformation with an intuitive and simple architecture, and introduce a mechanism to assist geometric transformation into UVCGAN, which introduces the Vision Transformer to CycleGAN. The goal is to improve the performance of the conversion from selfie photos to animated images by generating natural-looking images. For this purpose, landmark images of the original and reconstructed images are obtained when calculating the cycle-consistency loss, and the difference between them is also accounted for in the loss function. This method produces more natural-looking images than the conventional method, as the eye sizes and facial contours are now aligned.

Content from these authors
© 2025 by the Institute of Electrical Engineers of Japan
Previous article Next article
feedback
Top