混ざった声を聞き分ける最新技術：音源分離と目的音声抽出

中谷 智広; 池下 林太郎; デルクロア マーク; 落合 翼; 加茂 直之; 荒木 章子

doi:10.1587/essfr.18.4_267

Abstract

In this paper, the latest advancements in source separation and target speech extraction technologies are reviewed. The former technology separates individual sounds from an acoustic signal recorded with multiple voices and other sounds, and the latter one extracts only the speech of the desired speaker. These technologies make speech more understandable for humans and contribute to improving downstream speech applications. Two important approaches are discussed: signal-model-based and neural-network-based methods. Then detailed explanations of representative techniques in the approaches, blind source separation in reverberant environments, and target speech extraction based on voice features are provided. Finally, the future prospects of this technological field are discussed.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!