IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Online ISSN : 1745-1337
Print ISSN : 0916-8508
Monaural Speech Enhancement with Attention Augmented Dual-Path CRN And Short Time Discrete Cosine Transform
Lin ZHOUTongjia YANMingyang LIAo LI
Author information
JOURNAL FREE ACCESS Advance online publication

Article ID: 2025EAP1011

Details
Abstract

In recent years, the performance of phase-aware speech enhancement neural networks has steadily improved. However, dealing with complex-valued Short-Time Fourier Transform (STFT) spectrograms involves complex operations and phase estimation, which increases the complexity and parameter number of the model. To address this, we have built upon the foundation of DCTCRN and introduced real-valued Short-Time Discrete Cosine Transform (STDCT) spectrograms as input features, which avoids the complexities associated with phase estimation and modeling amplitude-phase relationships. To further enhance skip connections without increasing parameters, we have incorporated the SimAM attention mechanism. Additionally, we have added dual-path RNN modules between the encoder and decoder to capture long dependencies in both time and frequency dimensions. We have also introduced Hardtanh as the new scaling function. Through comparative experiments and ablation studies, we have confirmed the effectiveness of using STDCT spectrograms, attention mechanism and Hardtanh scaling function. Our approach demonstrates higher competitiveness in objective performance metrics compared to recent speech enhancement models. Notably, it achieves this while maintaining a relatively low parameter number, thus raising the performance ceiling of the DCTCRN series models.

Content from these authors
© 2025 The Institute of Electronics, Information and Communication Engineers
Previous article Next article
feedback
Top