Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
PAPERS
Speech recognition based on statistical models including multiple phonetic decision trees
Sayaka ShiotaKei HashimotoHeiga ZenYoshihiko NankakuAkinobu LeeKeiichi Tokuda
Author information
JOURNAL FREE ACCESS

2011 Volume 32 Issue 6 Pages 236-243

Details
Abstract
We propose a speech recognition technique using multiple model structures. In the use of context-dependent models, decision-tree-based context clustering is applied to find an appropriate parameter tying structure. However, context clustering is usually performed on the basis of unreliable statistics of hidden Markov model (HMM) state sequences because the estimation of reliable state sequences requires an appropriate model structures, that cannot be obtained prior to context clustering. Therefore, context clustering and the estimation of state sequences essentially cannot be performed independently. To overcome this problem, we propose an optimization technique of state sequences based on an annealing process using multiple decision trees. In this technique, a new likelihood function is defined in order to treat multiple model structures, and the deterministic annealing expectation maximization algorithm is used as the training algorithm. Experimental continuous phoneme recognition results show that the proposed method of using only two decision trees achieved about an 11.1% relative error reduction over the conventional method.
Content from these authors
© 2011 by The Acoustical Society of Japan
Next article
feedback
Top