複数決定リストの順次適用による文節まとめあげ

白木 伸征; 梅村 祥之; 原田 義久

doi:10.5715/jnlp.7.4_229

Abstract

Recent information-oriented society becomes to need Car-Multi-Media systems.In the systems, speech recognition and synthesis systems are also necessary. We aimed to improve Bunsetsu Identification which is important for them. There are two types of traditional Bunsetsu Identification methods: one is a method which uses handmade rules and the other is a method which uses machine learning. The former has high accuracy rate, but there are some problems especially for Car-Multi-Media systems. For example, the method is not flexible because it needs fixed inputs, and the method needs a lot of efforts to keep identification rules because all rules are made by hand. The latter is robust for these problems, but the algorithms are much more complex to improve accuracy, so there are some problems for Car-Multi-Media systems. Therefore, we propose a new method that uses plural decision lists sequentially. The Decision List method is very simple, but it does not have very high accuracy rate. Then, we use not ‘one’ decision list but ‘plural’ decision lists ‘sequentially’. We made some experiments using 10, 000 sentences as a training corpus, and 10, 000 sentences as a test corpus in Kyoto-University-Corpus. As the result, the accuracy rate was 99.38%.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!