JSAI Technical Report, Type 2 SIG
Online ISSN : 2436-5556
The 26th SIG-AGI
DanSto - Japanese Dataset of Short Stories for Evaluating Context Understanding
Rzepka RAFALDudzic KACPERAbe ARISAAraki KENJI
Author information
RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

2024 Volume 2023 Issue AGI-026 Pages 32-40

Details
Abstract

This paper introduces a dataset comprising more than 8,000 manually crafted short stories in the Japanese language. The primary objectives of this dataset encompass address-ing the dearth of comparable data in Japanese. Additionally, the dataset provides alternative endings for the narratives through crowdsourcing, ensuring they remain both plausible and marginally less probable than the original ones. This approach contributes to the creation of a testing benchmark that poses heightened challenges for contemporary large language models, particularly when contrasted with analogous benchmarks in English where conclusions are typi- cally dichotomized into correct and incorrect endings. The dataset is further expanded through automated manipulation of subjects and objects, and the study evaluates the performance of popular models across three key tasks: a) predicting story endings, b) substituting antonyms, and c) swapping nouns. Preliminary experiments show that zero-shot GPT-4 capabilities are relatively high, especially in case of recognizing sentences with swapped nouns (94% accuracy) while open-source Japanese LLMs struggle with processing proposed stories.

Content from these authors
© 2024 Authors
Previous article Next article
feedback
Top