2019 Volume 156 Pages 67-96
We report on a new Japanese corpus of eye-tracking data. The corpus design is partly modeled on the Dundee Eye-Tracking Corpus for English and French texts, but it addresses language-specific issues such as the lack of segmentation spaces in Japanese texts. Twenty-four native Japanese speakers read excerpts from the Balanced Corpus of Contemporary Written Japanese, presented with or without a space between segments. Segments were based on bunsetsu units (a content word plus functional material). Two types of methodologies were used for data collection: eye-tracking and self-paced reading. We report two analyses to illustrate the advantages of having such a large reading-time data set for texts that have annotations such as syntactic-dependency relations. First, contrary to previous eye-tracking reports based on relatively small sets of sentences, texts segmented with spaces were read more quickly than the same texts presented without spaces. Second, across the various types of sentences in the corpus, reading a bunsetu was faster the more it was preceded by dependent phrases. This evidence for anti-locality effects, is more general than what was thus far available in the literature.