IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Monaural Speech Enhancement Based on Multi-Resolution Feature Analysis
Yuewei ZHANGHuanbin ZOUJie ZHU
著者情報
ジャーナル フリー 早期公開

論文ID: 2024EDL8099

詳細
抄録

Multi-resolution spectrum feature analysis has demonstrated superior performance over traditional single-resolution methods in speech enhancement. However, previous multi-resolution-based methods typically have limited use of multi-resolution features, and some suffer from high model complexity. In this paper, we propose a more lightweight method that fully leverages the multi-resolution spectrum features. Our approach is based on a convolutional recurrent network (CRN) and employs a low-complexity multi-resolution spectrum fusion (MRSF) block to handle and fuse multi-resolution noisy spectrum information. We also improve the existing encoder-decoder structure, enabling the model to extract and analyze multi-resolution features more effectively. Furthermore, we adopt the short-time discrete cosine transform (STDCT) for time-frequency transformation, avoiding the phase estimation problem. To optimize our model, we design a multi-resolution STDCT loss function. Experiments demonstrate that the proposed multi-resolution STDCT-based CRN (MRCRN) achieves excellent performance and outperforms current advanced systems.

著者関連情報
© 2025 The Institute of Electronics, Information and Communication Engineers
前の記事 次の記事
feedback
Top