統計的モデル選択に基づいた連続音声からの語彙学習

田口 亮; 岩橋 直人; 船越 孝太郎; 中野 幹生; 能勢 隆; 新田 恒雄

doi:10.1527/tjsai.25.549

原著論文

統計的モデル選択に基づいた連続音声からの語彙学習

田口亮, 岩橋直人, 船越孝太郎, 中野幹生, 能勢隆, 新田恒雄

著者情報

ジャーナルフリー

2010 年 25 巻 4 号 p. 549-559

DOI https://doi.org/10.1527/tjsai.25.549

詳細

抄録

This paper proposes a method for the unsupervised learning of lexicons from pairs of a spoken utterance and an object as its meaning under the condition that any priori linguistic knowledge other than acoustic models of Japanese phonemes is not used. The main problems are the word segmentation of spoken utterances and the learning of the phoneme sequences of the words. To obtain a lexicon, a statistical model, which represents the joint probability of an utterance and an object, is learned based on the minimum description length (MDL) principle. The model consists of three parts: a word list in which each word is represented by a phoneme sequence, a word-bigram model, and a word-meaning model. Through alternate learning processes of these parts, acoustically, grammatically, and semantically appropriate units of phoneme sequences that cover all utterances are acquired as words. Experimental results show that our model can acquire phoneme sequences of object words with about 83.6% accuracy.

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）