Host: The Japanese Society for Artificial intelligence
Name : The 98th SIG-SLUD
Number : 98
Location : [in Japanese]
Date : September 03, 2023 - September 04, 2023
Pages 55-58
Although machine learning-based speech enhancement has been reported to have some degree of effectiveness in handling non-stationary noise and to outperform statistical methods such as the spectral subtraction based on the Minimum Mean Square Error (MMSE), there is a phenomenon where the performance is limited by the resolution of the Discrete Fourier Transform (DFT), a commonly used method for analyzing input signals. This limitation is particularly prominent in noisy environments with strong non-stationarity. To face with this problem, we propose to use two DFTs with difference size to analize input signal for machine learning based speach enhancement. As the result, we achieved up to 2.6 dB improvement in average Segmental Signal-to-Noise Ratio (Seg.SNR) across ten different noise environments when the input signal SNR was 0 dB.