Journal of the Japan Society for Precision Engineering
Online ISSN : 1882-675X
Print ISSN : 0912-0289
ISSN-L : 0912-0289
Paper
Improving Perceptual Loss with CLIP for Super-Resolution
Go OHTANIHirokatsu KATAOKAYoshimitsu AOKI
Author information
JOURNAL FREE ACCESS

2024 Volume 90 Issue 2 Pages 217-223

Details
Abstract

Perceptual loss, calculated by VGG network pre-trained on ImageNet, has been widely employed in the past for super-resolution tasks, enabling the generation of photo-realistic images. However, it has been reported that grid-like artifacts frequently appear in the generated images. To address this problem, we consider that large-scale pre-trained models can make significant contributions to super-resolution across different scenes. In particular, by combining language, those models can exhibit a strong capability to comprehend complex scenes, potentially enhancing super-resolution performance. Therefore, this paper proposes new perceptual loss with Contrastive Language-Image Pre-training (CLIP) based on Vision Transformer (ViT) instead of VGG network. The results demonstrate our proposed perceptual loss can generate photo-realistic images without grid-like artifacts.

Content from these authors
© 2024 The Japan Society for Precision Engineering
Previous article Next article
feedback
Top