2021 Volume 28 Issue 1 Pages 3-25
Recent progress in language modeling is promoting research on a question answering (QA) task without reading comprehension, which is called closed-book QA. While previous studies are focused on enlarging and sophisticating a model to address this task, we take a data-oriented approach to teach a model about diverse factual knowledge efficiently. We utilize Wikipedia as an additional source of knowledge to create an augmented dataset. We empirically show that our model trained with data augmentation correctly answers questions unseen in the training data, suggesting that the model learns new knowledge from the augmented data. Accordingly, our model outperforms the previously reported best performance on Quizbowl, and performs on par with a strong baseline on TriviaQA although our model has about 20 times fewer parameters.