ITE Technical Report
Online ISSN : 2424-1970
Print ISSN : 1342-6893
ISSN-L : 1342-6893
34.10
Session ID : ME2010-64
Conference information
Difference detection for similar documents based on image matching
Yumiko SusukiYutaka NakanoToshiyuki Yoshida
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Some of documents, which have a fixed format and are updated periodically, are often modified very partially, producing very similar documents before and after the modification. This paper aims at an automatic comparison and detection for such modifications in a pair of similar and printed documents. Although the simplest way for identifying such a modification is an application of an OCR system, the recognition ratio of many of current OCR systems is around 97% and is too low to obtain sufficient precision in our comparison application. This paper therefore treats a pair of target documents as images, and proposes an image-based comparison technique by using an image matching and a detection of the longest common sequences. Experimental results given in this paper illustrate that the proposed technique requires several ten seconds for a comparison of a pair of A4 size documents with 1500 Japanese characters, and gives a precision rate of 94% with a recall rate of 100%.

Content from these authors
© 2010 The Institute of Image Information and Television Engineers
Previous article Next article
feedback
Top