2020 Volume 27 Issue 4 Pages 801-824
This study introduces three language resources for Japanese lexical simplification: 1) an evaluation dataset, 2) lexica, and 3) a toolkit that can be used to develop and benchmark Japanese lexical simplification systems. The word complexity lexicon adopted in this study was automatically expanded using a classifier trained on a small word complexity lexicon created by Japanese language teachers. Based on this word complexity estimator, simplified word pairs were extracted from a large-scale synonym lexicon, and a simplified synonym lexicon that is useful for lexical simplification was developed. In addition, a Python library, which implements automatic evaluation and key methods in each subtask to ease the construction process of a lexical simplification pipeline, was developed. The experimental results on the developed evaluation dataset revealed that the proposed method, which is based on the developed lexicon, achieves the highest performance of Japanese lexical simplification.