Abstract
We will report performances of currently and publicly available Chinese analyzers and resources. We use YamCha, a tool based on Support Vector Machines, and the Penn Chinese Treebank as a language resource. Combining these two, we measure the performances of Chinese analysis, i. e., word segmentation, part-of-speech tagging, and base phrase chunking. In the experiment of word segmentation and part-of-speech tagging, we also report the performance of MOZ, a statistical morphological analyzer, which is also available to the public. We found that the accuracy of morphological analysis using YamCha attains around 88%, which is over 4% higher than that of MOZ, although it is computationally very expensive. We also found that the accuracy for base phrase chunking is approximately 93%.