画像変換データと U-Net を用いた多話者環境から特定話者の音声強調の検討

リン ケン; 渡邉 大河; 橋爪 裕貴; 長谷川 啓介; 宮崎 剛; 田中 博

doi:10.11371/wiieej.22.03.0_78

Reports of the 303rd Technical Conference of the Institute of Image Electronics Engineers of Japan

Session ID : 22-03-14

DOI https://doi.org/10.11371/wiieej.22.03.0_78

Conference information

Host: The Institute of Image Electronics Engineers of Japan

Name : Reports of the 303rd Technical Conference of the Institute of Image Electronics Engineers of Japan

Number : 303

Location : [in Japanese]

Date : February 21, 2023 - February 22, 2023

Investigation of Speech Enhancement for Specific Speaker’s Voice from Multi-Speaker Environment by Using Image Data and U-Net

*Jian LIN, Taiga WATANABE, Yuki HASHIZUME, Keisuke HASEGAWA, Tsuyoshi MIYAZAKI, Hiroshi TANAKA

Author information

Keywords: Spectrum Image, Speech Enhancement, U-Net

CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Details

Abstract

A method has been proposed to convert noisy speech data into images and remove the noise using U-Net, one of the fully convolutional networks. The authors have already conducted experiments to remove various types of noise in addition to human speech using this method. Good results were obtained in all experiments. In this study, we assumed that a specific person's voice is emphasized during the meeting to record his/her voice. Alternatively, we thought that the voice of the emergency announcement speaker or the voice of the evacuation guide is emphasized to convert into the text to convey it to the hearingimpaired person. The authors prepared multiple datasets for training and created a speech enhancement model for a specific speaker's speech from multiple (up to 6) speakers. Then, it is confirmed that the enhancing speech of a specific person in mixed voice data can be possible by regenerating the voice.

Corresponding author

Register with J-STAGE for free!