Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
Target sound information extraction: Speech and audio processing with neural networks conditioned on target clues
Tsubasa OchiaiMarc DelcroixTakafumi MoriyaTakanori AshiharaHiroshi SatoNaohiro TawaraTomohiro NakataniShoko Araki
著者情報
ジャーナル オープンアクセス 早期公開

論文ID: e24.124

詳細
抄録

This paper overviews neural target sound information extraction (TSIE), which consists of extracting the desired information about a sound source in an observed sound mixture given clues about the target source. TSIE is a general framework, which covers various applications, such as target speech/sound extraction (TSE), personalized voice activity detection (PVAD), target speaker automatic speech recognition (TS-ASR), etc. We formalize the ideas of TSIE and show how it can be implemented through various examples such as TSE, PVAD, and TS-ASR. We conclude the paper with a discussion of potential future research directions.

著者関連情報
© 2025 by The Acoustical Society of Japan

This article is licensed under a Creative Commons [Attribution-NoDerivatives 4.0 International] license.
https://creativecommons.org/licenses/by-nd/4.0/
前の記事 次の記事
feedback
Top