Abstract
Compilation of a hundred million words balanced corpus of contemporary written Japanese, or BCCWJ, is underway at the National Institute for Japanese Language. This is a five-year(2006-2010) joint program between the NIJL and the MEXT Grant-in-aid for Scientific Research Priority Area Program "Japanese Corpus". The first half of the paper begins with the clarification of the notion of "balanced" corpus, which is followed by the overview of the KOTONOHA project in which the BCCWJ is involved. KOTONOHA is NIJL's long-term corpora development initiative aiming at the construction of a series of corpora that cover both written and spoken language between the Meiji era and the present. The last half of the paper is devoted for the introduction to the BCCWJ. References will be made to the following issues: differences among the 3 sub-corpora of the BCCWJ, sample length, and the definition of 'word.' Answers to the typical FAQ are also presented. Lastly, the current status of the compilation work is reported with special emphasis on the issues of copyright clearance.