Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
Leveraging a Bilingual Corpus to Resolve Date–Duration Ambiguity in Japanese Numeric Day Expressions
Kazutaka KinugawaHideya MinoIsao GotoIchiro Yamada
Author information
JOURNAL FREE ACCESS

2022 Volume 29 Issue 2 Pages 638-668

Details
Abstract

In Japanese, time expressions are often unaccompanied by explicit temporal markers, and thus their temporal types are not always obvious. One of the most representative cases is date–duration ambiguity arising from the commonly used time expression, “** 日 [** nichi].” To build a supervised classifier for this ambiguity while minimizing the annotation burden, we introduce an automatic label generation method using a bilingual corpus. Inspired by an annotation projection technique, we associate Japanese time expressions with their corresponding English words. Ambiguity in Japanese time expressions is comparatively easily resolved using their associated English words. We prepared several simple rules to determine temporal type labels from sentence pairs, and automatically created a training set for this task. Through a human evaluation, we verified that 98.7% of the sampled labels match the hand-crafted labels. We then developed a classification model on these training examples and compared our automatically created examples with existing manually annotated data. Experimental results show that the produced examples improve classification models by up to 14.0% accuracy points. Hence, our label generation method not only minimized the annotation task but is also sufficiently reliable for building temporal type classifiers.

Content from these authors
© 2022 The Association for Natural Language Processing
Previous article Next article
feedback
Top