Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
Construction and Analysis of Evaluation Dataset for Japanese Lexical Semantic Change Detection
Zhidong LingTaichi AidaTeruaki OkaMamoru Komachi
Author information
JOURNAL FREE ACCESS

2024 Volume 31 Issue 4 Pages 1487-1522

Details
Abstract

Research in natural language processing has seen growing interest in automatically detecting and analyzing words whose meanings evolve over time from corpora. While diachronic corpora and evaluation word lists have been established for languages like English and German, such resources are lacking for Japanese. This study addresses this gap by introducing the Japanese Lexical Semantic Change Detection Dataset (JaSemChange), which has a list of evaluation words for Japanese. Leveraging three diachronic corpora spanning near modern to contemporary Japanese, we sampled usages of target words as pairs. A team of four experts annotated a total of 2,280 usage pairs of target words with semantic similarity to gauge the degree of semantic change. Furthermore, we assessed the performance of word embedding-based methods in detecting semantic change using this dataset. In addition to using frequency-based methods as a baseline, we compared the effectiveness of typical type-based and token-based methods and explored their respective characteristics. The dataset, covering the list of words assigned a degree of semantic change and the annotation scores for the usage pairs, is publicly available on GitHub.

Content from these authors
© 2024 The Association for Natural Language Processing
Previous article Next article
feedback
Top