IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Special Section on Natural Language Processing and its Applications
Joint Chinese Word Segmentation and POS Tagging Using an Error-Driven Word-Character Hybrid Model
Canasai KRUENGKRAIKiyotaka UCHIMOTOJun'ichi KAZAMAYiou WANGKentaro TORISAWAHitoshi ISAHARA
Author information
JOURNAL FREE ACCESS

2009 Volume E92.D Issue 12 Pages 2298-2305

Details
Abstract
In this paper, we present a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging. Our word-character hybrid model offers high performance since it can handle both known and unknown words.We describe our strategies that yield good balance for learning the characteristics of known and unknown words and propose an error-driven policy that delivers such balance by acquiring examples of unknown words from particular errors in a training corpus. We describe an efficient framework for training our model based on the Margin Infused Relaxed Algorithm (MIRA), evaluate our approach on the Penn Chinese Treebank, and show that it achieves superior performance compared to the state-of-the-art approaches reported in the literature.
Content from these authors
© 2009 The Institute of Electronics, Information and Communication Engineers
Previous article Next article
feedback
Top