2022 Volume 29 Issue 2 Pages 367-394
This paper describes the morphological analysis of unsegmented Hiragana strings. It is known that Hiragana strings have more ambiguities than Kanji-Kana mixed strings. Certain morphological analysis methods have been developed mainly for Hiragana strings, but most have not obtained sufficient analysis accuracy. The accuracy of a prior method is higher than that of the famous conventional morphological analysis tool for Kanji-Kana mixed strings, but the prior method has the problem in that it requires considerable amount of analysis time. Aiming for high-accuracy and practical-speed analysis of unsegmented Hiragana strings, we propose a sequential morphological analysis method using RNN (Recurrent Neural Network) and logistic regression. To speed up the analysis, the proposed method sequentially estimates word boundaries for each character boundary and estimates morpheme information for each word. To improve the accuracy of the analysis, the proposed method estimates word boundaries and morpheme information by integrating the estimation based on local information by logistic regression and the estimation based on global information by RNN. The experimental results confirmed that the proposed method achieved a speed-up of more than 100 times and a higher analysis accuracy than that of the prior method.