Abstract
This study aims to identify the linguistic elements related to readability of Japanese texts for second language learners, and construct a readability assessment model with high accuracy and explainability that meets the needs of language education. Specifically, 86 linguistic features across 5 categories were extracted from Japanese textbooks with difficulty levels, and readability models were constructed and evaluated. When comparing 4 classification models for automatic difficulty assessment, SVM (Support Vector Machine) showed best performance with an accuracy (ACC) of 0.898 in judging the readability of Japanese texts. Furthermore, feature selection using a stepwise approach identified 35 highly relevant factors to construct a model maintaining 0.880 accuracy while enhancing simplicity and explainability. Additionally, readability scores were quantified into three perspectives for visualization of prediction results. Thus, the readability model developed in this study not only demonstrated high predictive accuracy, but also contributed to explainability desired in the field of language education.