2025 年 91 巻 2 号 p. 156-162
The proliferation of deepfake technology, leveraging deep learning algorithms to manipulate facial features, attributes, and expressions in images, has elicited significant apprehension. Consequently, a burgeoning body of research aims at identifying images synthesized by deepfake algorithms. Although Vision Transformer-based methods have showcased commendable performance in image recognition, recent investigations suggest a decline in deepfake detection compared to convolutional neural network-based techniques. This study, proposes a high-precision deepfake detection approach employing the Wavelet Vision Transformer, incorporating self-supervised learning. The Wavelet Vision Transformer demonstrates proficiency in capturing essential high-frequency components within images, particularly pertinent for deepfake detection. By amalgamating it with self-supervised learning, a variant of representation learning, our method facilitates the precise detection of manipulation artifacts within deepfake images, thereby attaining elevated detection accuracy.