Abstract
Existing dictionaries, corpora, analyzers are not usually applicable to research using new part-of-speech tagset in the fields of linguistic research. Dictionaries and corpora are often newly constructed. On the other hand, existing analyzers can be reused by improving them. However, it is not clear how they could be improved. This paper describes how an analyzer constructed for analyzing a certain corpus can be applied to another corpus with a different part-of-speech tagset. In particular, we improved the features and labels used to train a long-unit-word analyzer based on Corpus of Spontaneous Japanese (CSJ) by focusing on the differences between CSJ and Balanced Corpus of Comtemporary Written Japanese (BCCWJ) and applied the analyzer to BCCWJ. The experimental results show the advantage of the proposed method.