機械学習を用いた音声強調処理の性能向上

鈴木 青龍; 藤岡 豊太; 永田 仁史

doi:10.11517/jsaislud.98.0_55

Abstract

Although machine learning-based speech enhancement has been reported to have some degree of effectiveness in handling non-stationary noise and to outperform statistical methods such as the spectral subtraction based on the Minimum Mean Square Error (MMSE), there is a phenomenon where the performance is limited by the resolution of the Discrete Fourier Transform (DFT), a commonly used method for analyzing input signals. This limitation is particularly prominent in noisy environments with strong non-stationarity. To face with this problem, we propose to use two DFTs with difference size to analize input signal for machine learning based speach enhancement. As the result, we achieved up to 2.6 dB improvement in average Segmental Signal-to-Noise Ratio (Seg.SNR) across ten different noise environments when the input signal SNR was 0 dB.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!