IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
TEFFDConv: An Improved Approach to Enhance Temporal Localization in Sound Event Detection
Xichang CAIJingxuan CHENZiyi LIUMenglong WUHongYang GUOXuejing SUN
Author information
JOURNAL FREE ACCESS Advance online publication

Article ID: 2024EDL8085

Details
Abstract

In recent years, convolutional recurrent neural networks (CRNNs) have achieved notable success in sound event detection (SED) tasks by leveraging the strengths of convolutional neural networks (CNNs) and recurrent neural networks (RNNs). However, existing models still face limitations in the temporal dimension, resulting in suboptimal temporal localization accuracy for SED. To address this issue, we designed a model called Temporal Enhanced Full-Frequency Dynamic Convolution (TEFFDConv). This model incorporates both temporal and frequency attention mechanisms with the full-dynamic convolution, enhancing the model's ability to localize sound events at the frame level. Experimental results demonstrate that our proposed model significantly improved PSDS1 and CB-F1 and IB-F1, marking a notable advancement compared to similar methods. Additionally, the PSDS2 also showed improvements over most methods. These results show the superior performance of our proposed method in enhancing temporal localization, while also demonstrating the better performance in event classification.

Content from these authors
© 2025 The Institute of Electronics, Information and Communication Engineers
Previous article Next article
feedback
Top