IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Special Section on Robust Speech Processing in Realistic Environments
Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method
Goshu NAGINOMakoto SHOZAKAITomoki TODAHiroshi SARUWATARIKiyohiro SHIKANO
Author information
JOURNAL FREE ACCESS

2008 Volume E91.D Issue 3 Pages 607-614

Details
Abstract
This paper proposes a technique for building an effective speech corpus with lower cost by utilizing a statistical multidimensional scaling method. The statistical multidimensional scaling method visualizes multiple HMM acoustic models into two-dimensional space. At first, a small number of voice samples per speaker is collected; speaker adapted acoustic models trained with collected utterances, are mapped into two-dimensional space by utilizing the statistical multidimensional scaling method. Next, speakers located in the periphery of the distribution, in a plotted map are selected; a speech corpus is built by collecting enough voice samples for the selected speakers. In an experiment for building an isolated-word speech corpus, the performance of an acoustic model trained with 200 selected speakers was equivalent to that of an acoustic model trained with 533 non-selected speakers. It means that a cost reduction of more than 62% was achieved. In an experiment for building a continuous word speech corpus, the performance of an acoustic model trained with 500 selected speakers was equivalent to that of an acoustic model trained with 1179 non-selected speakers. It means that a cost reduction of more than 57% was achieved.
Content from these authors
© 2008 The Institute of Electronics, Information and Communication Engineers
Previous article Next article
feedback
Top