IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Video Object Segmentation via Adaptive Multi-feature Fusion of Foreground and Background
Xuejun LIYuan ZONGJie ZHUCheng LUChuangao TANG
著者情報
ジャーナル フリー 早期公開

論文ID: 2025EDL8031

詳細
抄録

Semi-supervised video object segmentation (SVOS) is a challenging task that uses an initial frame mask to predict the segmentation of target objects in subsequent frames. Recently, various VOS methods have combined matching-based transductive inference with online inductive learning to capture more precise spatiotemporal information, thereby enhancing segmentation accuracy. However, while these methods improve feature extraction capabilities, they still fail to adequately address the full fusion of different features for more efficient feature utilization. To address the issue of low efficiency in feature fusion utilization in SVOS, we propose an adaptive multi-feature fusion method in this letter. This method proposes a Foreground-Background Multi-feature Encoder to effectively enhance feature diversity and uses a Multi-feature Fusion Module to dynamically integrate spatiotemporal cues from both the foreground and background. For different segmentation targets, the method employs a Feature Fusion Reader to autonomously select and adaptively fuse multiple foreground-background features, thereby achieving inter-feature optimization and significantly improving target-specific fusion efficiency. Extensive experiments on DAVIS 2017 and large-scale YouTube-VOS 2018/2019 datasets demonstrate that our proposed method achieves state-of-the-art performance.

著者関連情報
© 2025 The Institute of Electronics, Information and Communication Engineers
前の記事 次の記事
feedback
Top