Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
Constructing an Annotated Corpus and Inference Dataset for Japanese Numeral Expressions
Kana KoyanoHitomi YanakaKoji MineshimaDaisuke Bekki
Author information
JOURNAL FREE ACCESS

2023 Volume 30 Issue 2 Pages 432-455

Details
Abstract

Natural language inference is a core natural language understanding task for determining whether a hypothesis is true (entailment), false (contradiction), or neither (neutral) when a set of premises is true. Logical entailment and implicature can differ when an inference contains numeral expressions. Embedding numeral expressions in contexts such as negation and conditionals can enable reversing the entailment relation between a premise and a hypothesis to that embedded in general contexts. Furthermore, numeral expressions in Japanese are characterized by the flexibility of quantifier positions, the variety of numeral suffixes, and their usages. However, studies on developing annotated corpora focusing on these features and benchmark datasets for understanding Japanese numeral expressions have been limited. In this study, we developed a corpus that annotates each numeral expression in an existing phrase-structure-based Japanese treebank with its numeral suffix, position, and usage types. Furthermore, we constructed an inference dataset for numerical expressions based on our annotated corpus. Our inference dataset focused on inferences in which their entailment labels differ between logical entailment and implicature and contexts such as negations and conditionals where entailment labels can be reversed. Experiments were conducted using the constructed inference dataset to investigate the extent to which current standard pre-trained language models can handle inferences that require an understanding of numeral expressions. The results confirmed that the Japanese bidirectional encoder representation model cannot satisfactorily handle inferences involving various numeral expressions.

Content from these authors
© 2023 The Association for Natural Language Processing
Previous article Next article
feedback
Top