Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
34th (2020)
Session ID : 3Rin4-72
Conference information

Initiatives for the Development of Technology and Construction of Datasets to Enhance Searchability and Retrieval of Digitized Materials at the National Diet Library
*Toru AOIKETakahumi KINOSHITAWataru SATOMITakanori KAWASHIMA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

The National Diet Library is conducting research on layout analysis and character recognition of digitized materials for the purpose of producing high-quality text from materials that are difficult to read with existing OCR software, such as printed materials that have aged. The layout dataset constructed during our study has been made available to the public under a free license (https://github.com/ndl-lab/layout-dataset). In this paper, we introduce the published datasets and annotation tools and quantitatively evaluate the machine learning method used to semi-automate the creation of datasets. Finally, we discuss potential topics for future study using this dataset.

Content from these authors
© 2020 National Diet Library
Previous article Next article
feedback
Top