This paper presents an unsupervised speaker adaptation method from short utterances. The code spectra for the templates are adapted to those of an input speaker by interpolating the estimated speaker-difference vectors for given typical spectra. These dif ferencevectors are estimated so as to minimize the fuzzy objective function for the adapt edreference codebook under some constraints. The fuzziness (
F) and constraint pa rametersare examined using SPLIT-based word recognition tests with 28-word vocabu laryand reference patterns from a male speaker. Using 1.8 s training samples, the results show that the proposed method with F=1.5 gives a 9.0% higher recognition rate for the four male speakers than the minimum VQ distortion method (
F=1.0). For 20 male speakers, this method improves the average recognition rate from 92.5% without adaptation to 97.5% using 3.6s training samples. Furthermore, a sequential adaptation scheme attains an average recognition rate of 97.4% using test speech itself for adapta tion.
View full abstract