Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Paper
Design, Implementation, and Operation of Annotation Support System for Morphological Information of BCCWJ
Toshinobu OgisoTakenori Nakamura
Author information
JOURNAL FREE ACCESS

2014 Volume 21 Issue 2 Pages 301-332

Details
Abstract
“Balanced Corpus of Contemporary Written Japanese” is a large-scale Japanese corpus of 100 million words. It contains 170,000 XML files annotated with two levels of morphological information: short-unit word and long-unit word. We have constructed an annotation system to compile this corpus. The system allows many users to modify corpus annotations and dictionary entries, which are related to each other, while ensuring consistency. The system consists of a relational database server called the “Morphological Information Database,” a client tool that maintains the morphological information of the corpus called “Dynagon,” and a tool that manages dictionary entries for morphological analysis called “UniDic Explorer.” This paper describes the design, implementation, and operation of this “Morphological Information Database” for BCCWJ.
Content from these authors
© 2014 The Association for Natural Language Processing
Previous article Next article
feedback
Top