GCN-Transformer Autoencoder with Knowledge Distillation for Unsupervised Video Anomaly Detection

Mingchao Yan; Yonghua Xiong; Jinhua She

doi:10.20965/jaciii.2025.p0659

Abstract

Video anomaly detection is crucial in intelligent surveillance, yet the scarcity and diversity of abnormal events pose significant challenges for supervised methods. This paper presents an unsupervised framework that integrates graph attention networks (GATs) and Transformer architectures, combining masked autoencoders (MAEs) with self-distillation training. GATs are utilized to model spatial and inter-frame relationships, while Transformers capture long-range temporal dependencies, overcoming the limitations of traditional MAE and self-distillation approaches. The model employs a two-stage training process: first, a lightweight MAE combined with a GAT-Transformer fusion constructs a knowledge distillation module; second, the student autoencoder is optimized by integrating a graph convolutional autoencoder and a classification head to identify synthetic anomalies. We evaluate the proposed method on three representative datasets—ShanghaiTech Campus, UBnormal, and UCSD Ped2—and achieve promising results.

Content from these authors

This article cannot obtain the latest cited-by information.

This article is licensed under a Creative Commons [Attribution-NoDerivatives 4.0 International] license (https://creativecommons.org/licenses/by-nd/4.0/).
The journal is fully Open Access under Creative Commons licenses and all articles are free to access at JACIII official website.
https://www.fujipress.jp/jaciii/jc-about/#https://creativecommons.org/licenses/by-nd

Favorites & Alerts

Corresponding author

Funder information

1.Fund name: National Natural Science Foundation of China

2.Fund name: Hubei Provincial Natural Science Foundation of China

3.Fund name: 111 project

Register with J-STAGE for free!