Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
Neural RST-Style Discourse Parsing Exploiting Agreement Sub-trees as Silver Data
Naoki KobayashiTsutomu HiraoHidetaka KamigaitoManabu OkumuraMasaaki Nagata
Author information
JOURNAL FREE ACCESS

2022 Volume 29 Issue 3 Pages 875-900

Details
Abstract

Recent Rhetorical Structure Theory (RST)-style discourse parsing methods are trained by supervised learning, requiring an annotated corpus of sufficient size and quality. However, the RST Discourse Treebank, the most extensive corpus, consists of only 385 documents. This is insufficient to learn a long-tailed rhetorical-relation label distribution. To solve this problem, we propose a novel approach to improve the performance of low-frequency labels. Our approach utilized a silver dataset obtained from different parsers as a teacher parser. We extracted agreement subtrees from RST trees built by multiple teacher parsers to obtain a more reliable silver dataset. We used span-based top-down RST parser, a neural SOTA model, as a student parser. In our training procedure, we first pre-trained the student parser by the silver dataset and then fine-tuned it with a gold dataset, a human-annotated dataset. Experimental results showed that our parser achieved excellent scores for nuclearity and relation, that is, 64.7 and 54.1, respectively, on the Original Parseval.

Content from these authors
© 2022 The Association for Natural Language Processing
Previous article Next article
feedback
Top