日本語物語文を対象とする空所穴埋め問題データセット Dataset of Cloze Questions for Japanese Narrative Text

土屋 雅稔; 渡会 拓人

doi:10.1527/tjsai.37-4_A-LC3

Abstract

This paper describes our dataset of Japanese cloze questions designed for the evaluation of machine reading comprehension. The dataset consists of questions automatically generated from Aozora Bunko, and each question is defined as a 4-tuple: a context passage, a query holding a slot, an answer character, and a set of possible answer characters. The query is generated from the original sentence, which appears immediately after the context passage on the target book, by replacing the answer character with the slot. The set of possible answer characters consists of the answer character and the other characters who appear in the context passage. Because the context passage and the query share the same context, a machine that precisely understands the context may select the correct answer from the set of possible answer characters. The unique point of our approach is that we focus on characters of target books as slots to generate queries from original sentences because they play important roles in narrative texts and a precise understanding of their relationship is necessary for reading comprehension. To extract characters from target books, manually created dictionaries of characters are employed because some characters appear as common nouns, not as named entities.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!