Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
A Benchmark Suite of Japanese Natural Questions
Takuya UematsuHao WangSo FukudaDaisuke KawaharaTomohide Shibata
Author information
JOURNAL FREE ACCESS

2025 Volume 32 Issue 2 Pages 497-519

Details
Abstract

To develop high-performance and robust natural language processing (NLP) models, it is important to have various question answering (QA) datasets to train, evaluate, and analyze them. Although there are various QA datasets available in English, there are only a few QA datasets in other languages. We focus on Japanese, a language with only a few basic QA datasets, and aim to build a Japanese version of Natural Questions (NQ), JNQ, consisting of questions that naturally arise from human information needs. We collect natural questions from query logs of a Japanese search engine and build the dataset using crowdsourcing. Furthermore, we construct a Japanese version of BoolQ, JBoolQ, which is derived from NQ and consists of yes/no questions. We also re-define the dataset specification of the original NQ/BoolQ to construct JNQ/JBoolQ. JNQ consists of 16,641 questions, and JBoolQ consists of 6,467 questions. We also define three tasks from JNQ and one from JBoolQ and establish baselines using competitive methods drawn from related literature. We hope that these datasets will facilitate research on QA and NLP models in Japanese. We will make JNQ and JBoolQ publicly available.

Content from these authors
© 2025 The Association for Natural Language Processing
Previous article Next article
feedback
Top