ITE Transactions on Media Technology and Applications
Online ISSN : 2186-7364
ISSN-L : 2186-7364
Regular Section
[Paper] DLF-based Speech Segment Detection and Its Application to Audio Noise Removal for Video Conferences
Kazuto SasakiTakahiro OgawaSho TakahashiMiki Haseyama
著者情報
ジャーナル フリー

2016 年 4 巻 1 号 p. 68-77

詳細
抄録

A new decision-level fusion (DLF)-based speech segment detection method and its application to audio noise removal for video conferences are presented in this paper. The proposed method calculates visual and audio features from video sequences and audio signals, respectively, obtained in video conferences. Features extracted from mouth regions of participants and attribution degrees of speech class are used as visual and audio features, respectively, and Support Vector Machine (SVM)-based classification is performed by using each kind of feature. The SVM classifier performs two-class classification of speech and non-speech segments to realize speech segment detection. From the detection results obtained from the visual and audio features, DLF based on Supervised Learning from Multiple Experts is performed to successfully obtain the final detection results with focus on the accuracy of each detection result. Then, from audio signals in the non-speech segments detected by our method, we can extract noise information to realize accurate audio noise removal in the speech segments.

著者関連情報
© 2016 The Institute of Image Information and Television Engineers
前の記事 次の記事
feedback
Top