ITE Transactions on Media Technology and Applications
Online ISSN : 2186-7364
ISSN-L : 2186-7364
Regular Section
[Paper] DLF-based Speech Segment Detection and Its Application to Audio Noise Removal for Video Conferences
Kazuto SasakiTakahiro OgawaSho TakahashiMiki Haseyama
Author information

2016 Volume 4 Issue 1 Pages 68-77


A new decision-level fusion (DLF)-based speech segment detection method and its application to audio noise removal for video conferences are presented in this paper. The proposed method calculates visual and audio features from video sequences and audio signals, respectively, obtained in video conferences. Features extracted from mouth regions of participants and attribution degrees of speech class are used as visual and audio features, respectively, and Support Vector Machine (SVM)-based classification is performed by using each kind of feature. The SVM classifier performs two-class classification of speech and non-speech segments to realize speech segment detection. From the detection results obtained from the visual and audio features, DLF based on Supervised Learning from Multiple Experts is performed to successfully obtain the final detection results with focus on the accuracy of each detection result. Then, from audio signals in the non-speech segments detected by our method, we can extract noise information to realize accurate audio noise removal in the speech segments.

Information related to the author
© 2016 The Institute of Image Information and Television Engineers
Previous article Next article