Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Paper
On Document Similarity Measures
Masayuki AsaharaSachi Kato
Author information
JOURNAL FREE ACCESS

2016 Volume 23 Issue 5 Pages 463-499

Details
Abstract

Document similarity measuring techniques are used to evaluate both content and writing style. Evaluation measures for comparing the summary or translation of a system-generated source text with that of human-generated text have been proposed in text summarization and machine translation fields. The distance metrics are measures in terms of morphemes or morpheme sequences to evaluate or register different writing styles. In this study, we discuss the relations among the equivalence properties of mathematical metrics, similarities, kernels, ordinal scales, and correlations. In addition, we investigate the behavior of techniques for measuring content and style similarities for several corpora having similar content. The analysis results obtained using different document similarity measurement techniques indicate the instability of the evaluate system.

Content from these authors
© 2016 The Association for Natural Language Processing
Previous article Next article
feedback
Top