Abstract
One of the greatest difficulties in automatic speech recognition (ASR) is how to deal with variations in speech signals caused by nonlinguistic information, such as age and gender. Various methods have been proposed to compensate for the variations and one of them is speech structure. Speech structure, which extracts only contrastive features and discards absolute features, is proven to be transform-invariant mathematically and to be very robust with the nonlinguistic variations experimentally. Although the conventional speech structure extracts local and distant contrastive features, it doesnot extract dynamic features explicitly, which are supposed to exist in the contrastive features. In this paper, we reformulate speech structure based on trajectory Hidden Markov Model (HMM) and derive trajectory structure (TSR), in which dynamic and contrastive features can be defined and used in ASR. We carry out an experiment of n-best rescoring of isolated word recognition using trajectory structure and obtain a 28.5% relative decrease in the word error rate.