長単位解析器の異なる品詞体系への適用

小澤 俊介; 内元 清貴; 伝 康晴

doi:10.5715/jnlp.21.379

Abstract

Existing dictionaries, corpora, analyzers are not usually applicable to research using new part-of-speech tagset in the fields of linguistic research. Dictionaries and corpora are often newly constructed. On the other hand, existing analyzers can be reused by improving them. However, it is not clear how they could be improved. This paper describes how an analyzer constructed for analyzing a certain corpus can be applied to another corpus with a different part-of-speech tagset. In particular, we improved the features and labels used to train a long-unit-word analyzer based on Corpus of Spontaneous Japanese (CSJ) by focusing on the differences between CSJ and Balanced Corpus of Comtemporary Written Japanese (BCCWJ) and applied the analyzer to BCCWJ. The experimental results show the advantage of the proposed method.

Content from these authors

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!