Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Paper
Adaptation of Long-Unit-Word Analysis System to Different Part-Of-Speech Tagset
Shunsuke KozawaKiyotaka UchimotoYasuharu Den
Author information
JOURNAL FREE ACCESS

2014 Volume 21 Issue 2 Pages 379-401

Details
Abstract
Existing dictionaries, corpora, analyzers are not usually applicable to research using new part-of-speech tagset in the fields of linguistic research. Dictionaries and corpora are often newly constructed. On the other hand, existing analyzers can be reused by improving them. However, it is not clear how they could be improved. This paper describes how an analyzer constructed for analyzing a certain corpus can be applied to another corpus with a different part-of-speech tagset. In particular, we improved the features and labels used to train a long-unit-word analyzer based on Corpus of Spontaneous Japanese (CSJ) by focusing on the differences between CSJ and Balanced Corpus of Comtemporary Written Japanese (BCCWJ) and applied the analyzer to BCCWJ. The experimental results show the advantage of the proposed method.
Content from these authors
© 2014 The Association for Natural Language Processing
Previous article
feedback
Top