IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
TEFFDConv: An Improved Approach to Enhance Temporal Localization in Sound Event Detection
Xichang CAIJingxuan CHENZiyi LIUMenglong WUHongYang GUOXuejing SUN
著者情報
ジャーナル フリー 早期公開

論文ID: 2024EDL8085

詳細
抄録

In recent years, convolutional recurrent neural networks (CRNNs) have achieved notable success in sound event detection (SED) tasks by leveraging the strengths of convolutional neural networks (CNNs) and recurrent neural networks (RNNs). However, existing models still face limitations in the temporal dimension, resulting in suboptimal temporal localization accuracy for SED. To address this issue, we designed a model called Temporal Enhanced Full-Frequency Dynamic Convolution (TEFFDConv). This model incorporates both temporal and frequency attention mechanisms with the full-dynamic convolution, enhancing the model's ability to localize sound events at the frame level. Experimental results demonstrate that our proposed model significantly improved PSDS1 and CB-F1 and IB-F1, marking a notable advancement compared to similar methods. Additionally, the PSDS2 also showed improvements over most methods. These results show the superior performance of our proposed method in enhancing temporal localization, while also demonstrating the better performance in event classification.

著者関連情報
© 2025 The Institute of Electronics, Information and Communication Engineers
前の記事 次の記事
feedback
Top