文字連鎖を用いた複合語同音異義語誤りの検出手法とその評価

奥 雅博; 松岡 浩司

doi:10.5715/jnlp.4.3_83

Abstract

Most Japanese texts are produced with Japanese word processors.As Japanese textsconsist of phonograms, KANA, and ideograms, KANJI, Japanese word processorsalways use KANA-KANJI conversion in which KANA sequences input through thekeyboard are converted into KANA-KANJI sequences.Therefore, Japanese textssuffer from homophone errors caused by erroneous KANA-KANJI conversion.Ahomophone error occurs when a KANA sequence is converted into the wrong wordwhich has the same reading.Detecting homophone errors is an important topic in Japanese text revision support systems.We have already proposed a high performancemethod for handling Japanese homophone errors in compound nouns usedin REVISE.The method, however, has some drawbacks.To compensate for thesedrawbacks, this paper describes a method for detecting Japanese homophone errorsin compound nouns that uses character cooccurrence.Character cooccurrence canbe easily collected from existing texts without any analysis.Therefore, this methodcan be used, in a Japanese revision support system, as a complementary method forhandling Japanese homophone errors in compound nouns.Moreover, as this methoddepends only on character cooccurrence, it can be applied not only to homophoneerrors but also other types of errors such as character deletion.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!