抄録
We address the problem of automatically transcribing Japanese orthographic words into symbols representing their pronunciations. Such a function is necessary for commercial continuous speech recognition systems since there are constant needs to create new recognition lexica for new applications or purposes. Simple look-up schemes are not adequate to deal with Japanese, while methods based on morphological analysis require in-depth linguistic knowledge and development effort. In this paper, we propose a statistical approach which is based on an N-gram language model. It is assumed that the pronunciation of a character only depends on the previous one to two characters and their pronunciations. Given an orthographic word, our method outputs the most likely phonetic transcription. It is shown that our approach provides superior performance to the public-domain conversion tool KAKASI on ten out of twelve test sets.