Temporal information is important for grounding event expressions on a timeline. Temporal expression extraction has been performed as numerical representation extraction, which is a subtask of named entity extraction. For English texts, evaluation workshops were held in which temporal expressions were extracted and normalized. An annotation schema,
TimeML, was designed to annotate events and temporal expressions, and several annotated corpora of newswire texts were developed. However, a schema for temporal information and normalization of Japanese texts has not been designed. This paper proposes an annotation schema, which is based on
TimeML, for Japanese temporal information. We annotate the temporal information in parts of the ‘
Balanced Corpus of Contemporary Written Japanese’. We identify several problems in the annotation and discuss the steps to be taken to ground Japanese event expressions on a timeline.
View full abstract