Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
A Statistical Approach to Automatic Phonetic Transcription of Japanese Orthographic Words
Wei-Bin ChangSachiko Morishita
Author information
JOURNAL FREE ACCESS

2003 Volume 10 Issue 4 Pages 55-63

Details
Abstract
We address the problem of automatically transcribing Japanese orthographic words into symbols representing their pronunciations. Such a function is necessary for commercial continuous speech recognition systems since there are constant needs to create new recognition lexica for new applications or purposes. Simple look-up schemes are not adequate to deal with Japanese, while methods based on morphological analysis require in-depth linguistic knowledge and development effort. In this paper, we propose a statistical approach which is based on an N-gram language model. It is assumed that the pronunciation of a character only depends on the previous one to two characters and their pronunciations. Given an orthographic word, our method outputs the most likely phonetic transcription. It is shown that our approach provides superior performance to the public-domain conversion tool KAKASI on ten out of twelve test sets.
Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top