Abstract
“Balanced Corpus of Contemporary Written Japanese” is a large-scale Japanese corpus of 100 million words. It contains 170,000 XML files annotated with two levels of morphological information: short-unit word and long-unit word. We have constructed an annotation system to compile this corpus. The system allows many users to modify corpus annotations and dictionary entries, which are related to each other, while ensuring consistency. The system consists of a relational database server called the “Morphological Information Database,” a client tool that maintains the morphological information of the corpus called “Dynagon,” and a tool that manages dictionary entries for morphological analysis called “UniDic Explorer.” This paper describes the design, implementation, and operation of this “Morphological Information Database” for BCCWJ.