Host: The Japanese Society for Artificial Intelligence
Name : 34th Annual Conference, 2020
Number : 34
Location : Online
Date : June 09, 2020 - June 12, 2020
The National Diet Library is conducting research on layout analysis and character recognition of digitized materials for the purpose of producing high-quality text from materials that are difficult to read with existing OCR software, such as printed materials that have aged. The layout dataset constructed during our study has been made available to the public under a free license (https://github.com/ndl-lab/layout-dataset). In this paper, we introduce the published datasets and annotation tools and quantitatively evaluate the machine learning method used to semi-automate the creation of datasets. Finally, we discuss potential topics for future study using this dataset.