The processing of
kana-to-kanji conversion can be classified into two categories of processing: The first is the processing to detect the boundaries of words in non-segmented
kana strings, and the second is the processing to select the candidate of
kanji-kana words. Also, the methods of
kana-to-kanji conversion can be mainly classified into two types from the point of view of the two processing described above: One is to conduct simultaneously these two processing (called Method-A), and the other is to conduct sequentially them (called Method-B), namely, to detect the boundaries of
kana words by using Markov chain model of
kana words, and then to convert
kana words to
kanji-kana words and to select the maximum likely candidates by using Markov chain model of
kanji-kana words. This paper evaluates two types of
kana-to-kanji conversion method (Method-A and Method-B) by using 2nd-order Markov chain model of words. Through the experiments by using statistical data of daily Japanese newspaper, Method-A and Method-B are evaluated by the criteria of the accuracy rate of conversion, the conversion processing time and the memory capacity. From the results of the experiments, it is concluded that the Method-B is superior to Method-A in the conversion processing time and the memory capacity and is effective in
kana-to-kanji conversion of
bunsetsu.
抄録全体を表示