In this paper, we propose a new framework that employs whole-word hidden Markov models with a relaxed algorithm for likelihood calculation. Although voice activity detection (VAD) is an essential technique in order to provide a sophisticated voice interface, a perfect VAD has not been proposed yet. An incomplete VAD leads to misdetections frequently. Since the conventional likelihood calculation, which has limitation in edge states, assumes that VAD works well, we cannot obtain sufficient recognition accuracy against the misdetections. The proposed method is a technique to improve the performance against misdetected speech in the framework which uses whole-word HMMs. The method employs an algorithm which relaxes the limitation of likelihood calculation in the edge states. In order to verify the effectiveness of the proposed method, we carried out recognition experiments for artificially shortened segments. With the conventional method, the average recognition rates were 91.81%, 91.50%, 89.56%, and 85.13% under 3, 4, 5, and 6 states/phoneme conditions, respectively. However, the proposed method allowed us to achieve 91.88%, 92.64%, 92.74%, and 93.14% under the same conditions. Furthermore, we confirmed the effectiveness of the proposed method through experiments combined with CENSREC-1-C.
抄録全体を表示