IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Structural Relation Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
Dingjie PENGWataru KAMEYAMA
著者情報
ジャーナル フリー 早期公開

論文ID: 2024EDP7125

詳細
抄録

Weakly Supervised Semantic Segmentation (WSSS) aims to train models to identify and delineate objects within an image using limited training data such as image-level labels. While recent works mainly focus on exploring class-specific knowledge to improve the quality of class activation maps, we contend that relying solely on this approach within a non-hierarchical architecture fails to adequately capture the structural relationships within images. Drawing inspiration from fully supervised semantic segmentation designs, which use hierarchical multi-scale feature maps for predicting the dense masks, we propose a novel architecture that integrates a Structural Relation Multi-class Token Transformer (SR-MCT) with WSSS. This model employs multi-scale structural tokens, generated by a Spatial Prior Module (SPM), which interact not only with patch tokens to encode structural relations, but also with multi-class tokens to integrate class-specific knowledge into complex structural embeddings. The proposed Structural Relation Multi-class Token Attention effectively builds long-range dependencies among structural tokens, patch tokens, and multi-class tokens simultaneously. Experimental results and ablation studies on PASCAL VOC 2012 and MS COCO 2014 demonstrate that our proposed SR-MCT can enhance baseline performance and outperform other state-of-the-art methods.

著者関連情報
© 2024 The Institute of Electronics, Information and Communication Engineers
前の記事 次の記事
feedback
Top