抄録
This paper presents a between-word distance (BWD) calculation in a symbolic domain anddiscusses its typical application in estimating the degree of the speech recognition difficultyfor given word sets. The first part of the paper describes a method for the distance calculationwhich employs matching by DP (dynamic programming) on subphonemic segmentsequences to take phonemic-context-dependent characteristics into account. In order to testthe usefulness of the method, two types of word sets are composed using a distance-basedclustering technique. Vocabularies of one type have dense sample distributions while theothers have sparse sample distributions in a BWD sense. Speaker-independent word recognitionis examined for these word sets using a common phone-HMM-based speech recognitiontechnique. We compare the recognition results and the statistical characteristics ofindividual word sets, and present criteria for relative order of the recognition difficulty ofgiven word sets. One criterion using between-word distance distributions of n-nearest neighbor words provides a reasonable index for the recognition difficulty.