IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Regular Section
Detecting Partial and Near Duplication in the Blogosphere
Yeo-Chan YOONMyung-Gil JANGHyun-Ki KIMSo-Young PARK
Author information
JOURNAL FREE ACCESS

2012 Volume E95.D Issue 2 Pages 681-685

Details
Abstract
In this paper, we propose a duplicate document detection model recognizing both partial duplicates and near duplicates. The proposed model can detect partial duplicates as well as exact duplicates by splitting a large document into many small sentence fingerprints. Furthermore, the proposed model can detect even near duplicates, the result of trivial revisions, by filtering the common words and reordering the word sequence.
Content from these authors
© 2012 The Institute of Electronics, Information and Communication Engineers
Previous article Next article
feedback
Top