JSAI Technical Report, Type 2 SIG
Online ISSN : 2436-5556
Automatic Role Labeling of OCR Processed Scholarly Papers
Kenichi IWATSUKITsuneaki KATOKazunori YAMAGUCHI
Author information
RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

2016 Volume 2016 Issue AM-12 Pages 10-

Details
Abstract

Components of scholarly papers bear roles such as title, body, itemization title, or figure. A role label enables advanced searching such as finding papers in which a specified keyword is used in a specified role. In this paper, we propose a fully automatic role labeling method for OCR processed scholarly papers. In the proposed method, we first identify components from the OCR processed images by reconstructing components from incorrectly recognized regions by OCR software. Next, we assign role labels to the components. Our experiment showed that the accuracy of the classification reached 94% in the best case.

Content from these authors
© 2016 Authors
Previous article
feedback
Top