Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper
Japanese Discourse Relation Analysis: Task Definition, Connective Detection, and Corpus Annotation
Yudai KishimotoYugo MurawakiDaisuke KawaharaSadao Kurohashi
Author information
JOURNAL FREE ACCESS

2020 Volume 27 Issue 4 Pages 889-931

Details
Abstract

Although discourse parsing is fundamental to natural language processing, limited research has been conducted on corpus-based discourse parsing in Japanese. Herein, we construct a Japanese corpus annotated with discourse units, discourse connectives, and discourse relations. We propose four strategies of easily and rapidly developing a corpus: (1) selecting web documents with their first three sentences as the target documents, (2) automatically annotating discourse units and connectives, (3) designing a discourse relation tagset consisting of seven classes organized into a two-level hierarchy, and (4) annotating discourse relations through two types of annotators, namely experts and crowd workers. We report that there is significant room for improvement in data annotation performed by crowd workers. Based on this corpus, we develop a Japanese discourse parser. Experimental results show that the proposed parser outperforms previously developed models. We also demonstrate that the automatic recognizer of discourse connectives can be used as a high-quality parser for explicit discourse relations. We implement a recognizer of discourse units and discourse connectives in KNP. We also make the corpus publicly available.

Content from these authors
© 2020 The Association for Natural Language Processing
Previous article Next article
feedback
Top