Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
INVITED REVIEW
Target sound information extraction: Speech and audio processing with neural networks conditioned on target clues
Tsubasa OchiaiMarc DelcroixTakafumi MoriyaTakanori AshiharaHiroshi SatoNaohiro TawaraTomohiro NakataniShoko Araki
Author information
JOURNAL OPEN ACCESS

2025 Volume 46 Issue 3 Pages 197-209

Details
Abstract

This paper overviews neural target sound information extraction (TSIE), which consists of extracting the desired information about a sound source in an observed sound mixture given clues about the target source. TSIE is a general framework, which covers various applications, such as target speech/sound extraction (TSE), personalized voice activity detection (PVAD), target speaker automatic speech recognition (TS-ASR), etc. We formalize the ideas of TSIE and show how it can be implemented through various examples such as TSE, PVAD, and TS-ASR. We conclude the paper with a discussion of potential future research directions.

Content from these authors
© 2025 by The Acoustical Society of Japan

This article is licensed under a Creative Commons [Attribution-NoDerivatives 4.0 International] license.
https://creativecommons.org/licenses/by-nd/4.0/
Next article
feedback
Top