2020 年 24 巻 4 号 p. 175-178
Bone-conducted (BC) speech has a significant advantage as a solution for speech communication in an extremely noisy environment because of its stability against surrounding noise. However, the quality and intelligibility of BC speech degrade, making BC speech difficult to restore. To solve this problem, we propose a method for restoring BC speech with a combination of a linear prediction (LP) model using line spectral frequencies (LSFs) and a long short-term memory (LSTM) model. We evaluated the method using three objective measurements: log-spectrum distortion, LP coefficient distance, and a perceptual evaluation of speech quality. The results of all three measurements show that our method is better than the previous method, which used a simple recurrent network. These results also show that the model can yield speech with better quality when the LP gain is estimated more accurately.