抄録
To overcome the lack of theoretical basis of a fundamental, word spotting-based approach to the recognition of natural, spontaneous speech utterances, we propose in this paper a novel spotter (spotting system) design method referred to as Minimum Error Classification of Keyword-sequences (MECK). A key concept of the method is to formalize the entire spotting process as a trainable functional form with the design objective being the keyword-sequence (a string of prescribed keyword categories) classification accuracy. A resulting MECK procedure allows one to design spotters in an efficient way of using only pairs of utterances and their corresponding phonemic transcriptions (not requiring hand-segmented labels) as well as in a mathematically-proven way consistent with the error minimization of the keyword-sequence classification. MECK is quite general and can be applied to any reasonable spotter structure. The paper specially presents implementation details for a prototype-based spotter and demonstrates the utility of this MECK-trained spotter in several Japanese keyword spotting tasks.