日本語談話関係解析：タスク設計・談話標識の自動認識・ コーパスアノテーション

岸本 裕大; 村脇 有吾; 河原 大輔; 黒橋 禎夫

doi:10.5715/jnlp.27.889

Abstract

Although discourse parsing is fundamental to natural language processing, limited research has been conducted on corpus-based discourse parsing in Japanese. Herein, we construct a Japanese corpus annotated with discourse units, discourse connectives, and discourse relations. We propose four strategies of easily and rapidly developing a corpus: (1) selecting web documents with their first three sentences as the target documents, (2) automatically annotating discourse units and connectives, (3) designing a discourse relation tagset consisting of seven classes organized into a two-level hierarchy, and (4) annotating discourse relations through two types of annotators, namely experts and crowd workers. We report that there is significant room for improvement in data annotation performed by crowd workers. Based on this corpus, we develop a Japanese discourse parser. Experimental results show that the proposed parser outperforms previously developed models. We also demonstrate that the automatic recognizer of discourse connectives can be used as a high-quality parser for explicit discourse relations. We implement a recognizer of discourse units and discourse connectives in KNP. We also make the corpus publicly available.

Content from these authors

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!