2022 Volume 28 Issue 1 Pages 1-13
We propose in this paper a query-by-example spoken term detection (QbE-STD) method for keyword detection from zero-resource language speech databases. The proposed method employs the phonetic posteriorgram (PPG) trained with multiple resource-rich languages and combines multilingual PPGs for speech representation. The keywords are detected using the dynamic time warping method. We examined three types of combination of multiple languages such as concatenation of PPG (PPG_CONC), a combination of language resources to calculate multilingual PPG (PPG_ALL), and multi-task training of PPG using multiple languages (PPG_DIV). We carried out an experiment of the QbE-STD from Kaqchikel speech. As a result, the use of PPG showed better detection performance than the method based on the conventional speech feature (MFCC), and the use of multiple languages gave a further improvement of detection.