IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Online ISSN : 1745-1337
Print ISSN : 0916-8508
Regular Section
Shape-Aware Convolution with Convolutional Kernel Attention for RGB-D Image Semantic Segmentation
Kun ZHOUZejun ZHANGXu TANGWen XUJianxiao XIEChangbing TANG
Author information
JOURNAL FREE ACCESS

2025 Volume E108.A Issue 2 Pages 140-148

Details
Abstract

RGB-D semantic segmentation has attracted increasing attention over the past few years. The depth feature encodes both the shape of a local geometry as well as the base (whereabout) of it in a larger context. RGB and depth images can be concatenated into one and inputted into a network model, reducing additional computation but resulting in some distractive information as they are multimodal. For the problem, we propose a Shape-aware Convolutional layer with Convolutional Kernel Attention (CKA-ShapeConv) for reducing the distractive information by leveraging each unique input feature to rectify the kernels. Instead of using a single convolution kernel, we aggregate N parallel convolution kernels based on input-dependent attention. Specifically, four sets of attention weights are firstly calculated from each input feature map, next N parallel convolution kernels are weighted and aggregated along different dimensions, which ensure that the generated convolution kernel is more capable of catching semantic information from the input feature map, reducing interference between RGB and depth features. Then the aggregated convolution kernel is decomposed into two components: base and shape, two new learnable weights are introduced to cooperate with them independently, and finally a convolution is applied on the re-weighted combination of these two components. These two components can capture semantic and shape information of regions effectively, respectively. Meanwhile, our CKA-ShapeConv layer can be easily integrated into most existing backbone models with only a small amount of additional computation. Our experiments on NYUDv2 and SUN RGB-D datasets show that the proposed CKA-ShapeConv layer can improve the performance of backbone models effectively.

Content from these authors
© 2025 The Institute of Electronics, Information and Communication Engineers
Previous article Next article
feedback
Top