IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Online ISSN : 1745-1337
Print ISSN : 0916-8508
MST-Adapter : Multi-scaled Spatio-Temporal Adapter for Parameter-Efficient Image-to-Video Transfer Learning
Chenrui CHANGTongwei LUFeng YAO
Author information
JOURNAL FREE ACCESS Advance online publication

Article ID: 2024EAP1080

Details
Abstract

Large-scale image pre-training models have recently demonstrated strong representation capabilities in spatial information contexts. Prior works apply these models to video action recognition through fully fine-tuning, which is expensive and resource-intensive. To reduce computational costs, some studies have shifted their focus to efficient parameter fine-tuning methods. However, existing efficient fine-tuning methods lack exploration of multi-scale information in videos. In this work, the Multi-scale spatio-temporal Adapter (MST-Adapter) is proposed for parameter-efficient Image-to-Video transfer learning. By freezing the pretrained models and adding the lightweight adapters, we only need to update few parameters, which is highly efficient. In addition, extensive experiments on two video action recognition benchmarks show that our method can learn high-quality video spatio-temporal representations and achieve competitive or even better performance than prior works.

Content from these authors
© 2024 The Institute of Electronics, Information and Communication Engineers
Previous article Next article
feedback
Top