Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
Universal Dependencies for Corpus of Everyday Japanese Conversation: UD_Japanese-CEJC
Mai OmuraAya WakasaHiroshi MatsudaMasayuki Asahara
Author information
JOURNAL FREE ACCESS

2025 Volume 32 Issue 1 Pages 55-90

Details
Abstract

In this study, we report the development and construction of the universal dependencies-based Japanese spoken language treebank (UD_Japanese-CEJC), a conversion of the corpus of everyday Japanese conversation (CEJC) into the universal dependencies format. The CEJC is a large-scale spoken language corpus that includes various everyday Japanese conversations, annotated with word boundaries and morphological information. For the UD Japanese-CEJC, we annotated the CEJC with long-unit morphological and phrase dependency information. It was constructed according to manually refined conversion rules from the CEJC, using morphological information and Bunsetsu phrase-based syntactic dependencies. We examined various issues related to UD constructions in the CEJC by comparing it with a written Japanese corpus and evaluating UD parsing accuracy.

Content from these authors
© 2025 The Association for Natural Language Processing
Previous article Next article
feedback
Top