This paper shows the effectiveness of applying our method to kana-kanji conversionof Japanese derivative words by experiments. In usual natural language processing, morphologic analysis precedes syntactic analysis.Meanwhile, each derivative word needs semantic analysis first, because it has inner structure composed of a noun and a suffix, and simultaneously acts as one word.Thus, in our method, we use a PCFG that consists of full size thesaurus, large number of examples generalized to every intermediate level, a part-of-speech level rule, and word level lexicalized rules. Moreover, we weight the frequencies of the examples to prioritize high-density region, and choose the best learning condition according to the result of investigation of characteristics curves in various situations.In some former researches, they have thought that applying thesaurus to syntactic analysis was not so effective.However, it seems that it was because of the lack of training data and improper use of generalization. With enough number of examples and optimum generalization, our method can achieve over 95%, higher accuracy than ever thought.
View full abstract