Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
MATCHA: Parallel Corpus for Japanese Text Simplification Based on Professionally Simplified Articles
Rina MiyataHyuga KoretakaHiroki YamauchiDaiki YanamotoTomoyuki KajiwaraTakashi NinomiyaYasuhiro Nishiwaki
Author information
JOURNAL FREE ACCESS

2024 Volume 31 Issue 2 Pages 590-609

Details
Abstract

In this study, we construct and release a Japanese parallel corpus for text simplification. Existing Japanese corpora for this task have been constructed by non-experts, and there is no high-quality, large-scale corpus constructed by experts. We constructed a large-scale parallel corpus at the sentence-level by manually sentence alignment on articles that had been simplified by experts. Our human evaluation revealed that parallel corpora that were simplified by experts contained a more diverse set of simplification operations than those by non-experts. We also found that our parallel corpus is fluently and adequately simplified.

Content from these authors
© 2024 The Association for Natural Language Processing
Previous article Next article
feedback
Top