Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
Language-queried target speech extraction using para-linguistic and non-linguistic prompts
Kentaro SekiNobutaka ItoKazuki YamauchiYuki OkamotoKouei YamaokaYuki SaitoShinnosuke TakamichiHiroshi Saruwatari
Author information
JOURNAL OPEN ACCESS Advance online publication

Article ID: e25.27

Details
Abstract

This paper proposes a new language-queried target speech extraction (TSE) task called para-linguistic and non-linguistic text prompts-based TSE (PNTP-TSE), which uses text prompts that describe para-linguistic and non-linguistic information. This framework addresses the limitations of conventional TSE methods, such as privacy concerns in voiceprint-based systems and dependency on dedicated microphone arrays or video cameras. To support this framework, we construct and provide a new dataset, PromptTSE, which is specifically designed to facilitate various types of language-queried TSE, including PNTP-TSE. We develop a baseline method for PNTP-TSE and conduct experimental evaluations. The experimental results show that PNTP-TSE overcomes the performance degradation issue of voiceprint-based systems caused by the gap in speaking style between enrollment speech and target speech.

Content from these authors
© 2025 by The Acoustical Society of Japan

This article is licensed under a Creative Commons [Attribution-NoDerivatives 4.0 International] license.
https://creativecommons.org/licenses/by-nd/4.0/
Previous article Next article
feedback
Top