IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Regular Section
Localization of Pointed-At Word in Printed Documents via a Single Neural Network
Rubin ZHAOXiaolong ZHENGZhihua YINGLingyan FAN
Author information

2022 Volume E105.D Issue 5 Pages 1075-1084


Most existing object detection methods and text detection methods are mainly designed to detect either text or objects. In some scenarios where the task is to find the target word pointed-at by an object, results of existing methods are far from satisfying. However, such scenarios happen often in human-computer interaction, when the computer needs to figure out which word the user is pointing at. Comparing with object detection, pointed-at word localization (PAWL) requires higher accuracy, especially in dense text scenarios. Moreover, in printed document, characters are much smaller than those in scene text detection datasets such as ICDAR-2013, ICDAR-2015 and ICPR-2018 etc. To address these problems, the authors propose a novel target word localization network (TWLN) to detect the pointed-at word in printed documents. In this work, a single deep neural network is trained to extract the features of markers and text sequentially. For each image, the location of the marker is predicted firstly, according to the predicted location, a smaller image is cropped from the original image and put into the same network, then the location of pointed-at word is predicted. To train and test the networks, an efficient approach is proposed to generate the dataset from PDF format documents by inserting markers pointing at the words in the documents, which avoids laborious labeling work. Experiments on the proposed dataset demonstrate that TWLN outperforms the compared object detection method and optical character recognition method on every category of targets, especially when the target is a single character that only occupies several pixels in the image. TWLN is also tested with real photographs, and the accuracy shows no significant differences, which proves the validity of the generating method to construct the dataset.

Related papers from these authors
© 2022 The Institute of Electronics, Information and Communication Engineers
Previous article Next article