2025 Volume 145 Issue 11 Pages 1012-1021
Unpaired image-to-image translation is a task that is expected to be applied in scientific simulations and other applications. In particular, the conversion of selfie photos to animated images is expected to be applied in the production of animation, manga, games, etc. CycleGAN was the first model to achieve unpaired image-to-image translation tasks, and many methods have been proposed to apply it since then. However, these methods often had difficulty with geometric transformations, and it was difficult to achieve selfie photo to animation translations involving large geometric transformations for the eyes and nose. In this study, we focus on a CycleGAN-based method that can perform the transformation with an intuitive and simple architecture, and introduce a mechanism to assist geometric transformation into UVCGAN, which introduces the Vision Transformer to CycleGAN. The goal is to improve the performance of the conversion from selfie photos to animated images by generating natural-looking images. For this purpose, landmark images of the original and reconstructed images are obtained when calculating the cycle-consistency loss, and the difference between them is also accounted for in the loss function. This method produces more natural-looking images than the conventional method, as the eye sizes and facial contours are now aligned.
The transactions of the Institute of Electrical Engineers of Japan.C
The Journal of the Institute of Electrical Engineers of Japan