2016 Volume 2016 Issue AM-12 Pages 10-
Components of scholarly papers bear roles such as title, body, itemization title, or figure. A role label enables advanced searching such as finding papers in which a specified keyword is used in a specified role. In this paper, we propose a fully automatic role labeling method for OCR processed scholarly papers. In the proposed method, we first identify components from the OCR processed images by reconstructing components from incorrectly recognized regions by OCR software. Next, we assign role labels to the components. Our experiment showed that the accuracy of the classification reached 94% in the best case.