2025 Volume 32 Issue 2 Pages 497-519
To develop high-performance and robust natural language processing (NLP) models, it is important to have various question answering (QA) datasets to train, evaluate, and analyze them. Although there are various QA datasets available in English, there are only a few QA datasets in other languages. We focus on Japanese, a language with only a few basic QA datasets, and aim to build a Japanese version of Natural Questions (NQ), JNQ, consisting of questions that naturally arise from human information needs. We collect natural questions from query logs of a Japanese search engine and build the dataset using crowdsourcing. Furthermore, we construct a Japanese version of BoolQ, JBoolQ, which is derived from NQ and consists of yes/no questions. We also re-define the dataset specification of the original NQ/BoolQ to construct JNQ/JBoolQ. JNQ consists of 16,641 questions, and JBoolQ consists of 6,467 questions. We also define three tasks from JNQ and one from JBoolQ and establish baselines using competitive methods drawn from related literature. We hope that these datasets will facilitate research on QA and NLP models in Japanese. We will make JNQ and JBoolQ publicly available.