JSAI Technical Report, SIG-SLUD
Online ISSN : 2436-4576
Print ISSN : 0918-5682
98th (Sep.2023)
Conference information

Improving the performance of speech enhancement using machine learning
Seiryu SUZUKIToyota FUJIOKAYoshifumi NAGATA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Pages 55-58

Details
Abstract

Although machine learning-based speech enhancement has been reported to have some degree of effectiveness in handling non-stationary noise and to outperform statistical methods such as the spectral subtraction based on the Minimum Mean Square Error (MMSE), there is a phenomenon where the performance is limited by the resolution of the Discrete Fourier Transform (DFT), a commonly used method for analyzing input signals. This limitation is particularly prominent in noisy environments with strong non-stationarity. To face with this problem, we propose to use two DFTs with difference size to analize input signal for machine learning based speach enhancement. As the result, we achieved up to 2.6 dB improvement in average Segmental Signal-to-Noise Ratio (Seg.SNR) across ten different noise environments when the input signal SNR was 0 dB.

Content from these authors
© 2023 The Japaense Society for Artificial Intelligence
Previous article Next article
feedback
Top