Reports of the Technical Conference of the Institute of Image Electronics Engineers of Japan
Online ISSN : 2758-9218
Print ISSN : 0285-3957
Reports of the 303rd Technical Conference of the Institute of Image Electronics Engineers of Japan
Session ID : 22-03-14
Conference information

Investigation of Speech Enhancement for Specific Speaker’s Voice from Multi-Speaker Environment by Using Image Data and U-Net
*Jian LINTaiga WATANABEYuki HASHIZUMEKeisuke HASEGAWATsuyoshi MIYAZAKIHiroshi TANAKA
Author information
CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Details
Abstract
A method has been proposed to convert noisy speech data into images and remove the noise using U-Net, one of the fully convolutional networks. The authors have already conducted experiments to remove various types of noise in addition to human speech using this method. Good results were obtained in all experiments. In this study, we assumed that a specific person's voice is emphasized during the meeting to record his/her voice. Alternatively, we thought that the voice of the emergency announcement speaker or the voice of the evacuation guide is emphasized to convert into the text to convey it to the hearingimpaired person. The authors prepared multiple datasets for training and created a speech enhancement model for a specific speaker's speech from multiple (up to 6) speakers. Then, it is confirmed that the enhancing speech of a specific person in mixed voice data can be possible by regenerating the voice.
Content from these authors
© 2023 by The Institute of Image Electronics Engineers of Japan
Previous article Next article
feedback
Top