Real-Time Video Matting Based on RVM and Mobile ViT

Chengyu WU; Jiangshan QIN; Xiangyang LI; Ao ZHAN; Zhengqiang WANG

doi:10.1587/transinf.2023EDL8071

Abstract

Real-time matting is a challenging research in deep learning. Conventional CNN (Convolutional Neural Networks) approaches are easy to misjudge the foreground and background semantic and have blurry matting edges, which result from CNN's limited concentration on global context due to receptive field. We propose a real-time matting approach called RMViT (Real-time matting with Vision Transformer) with Transformer structure, attention and content-aware guidance to solve issues above. The semantic accuracy improves a lot due to the establishment of global context and long-range pixel information. The experiments show our approach exceeds a 30% reduction in error metrics compared with existing real-time matting approaches.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!