2020 Volume 8 Issue 1 Pages 10-16
In this study, to promote the translation and digitization of historical documents, we attempted to recognize Japanese classical ‘kuzushiji’ characters by using the dataset released by the Center for Open Data in the Humanities (CODH). ‘Kuzushiji’ were anomalously deformed and written in cursive style. As such, even experts would have difficulty recognizing these characters. Using deep learning, which has undergone remarkable development in the field of image classification, we analyzed how successfully deep learning could classify more than 1,000-class ‘kuzushiji’ characters through experiments. As a result of the analysis, we identified the causes of poor performance for specific characters: (1) ‘Hiragana’ and ‘katakana’ have a root ‘kanji’ called ‘jibo’ and that leads to various shapes for one character, and (2) shapes for hand-written characters also differ depending on the writer or the work. Based on this, we found that it is necessary to incorporate specialized knowledge in ‘kuzushiji’ in addition to the improvement of recognition technologies such as deep learning.