Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
Paper
Study on Constants of Natural Language Texts
Daisuke KimuraKumiko Tanaka-Ishii
Author information
JOURNALS FREE ACCESS

2014 Volume 21 Issue 4 Pages 877-895

Details
Abstract

This paper considers different measures that might become constants for any length of a given natural language text. Such measures indicate a potential for studying the complexity of natural language but have previously only been studied using relatively small English texts. In this study, we consider measures for texts in languages other than English, and for large-scale texts. Among the candidate measures, we consider Yule's K, Orlov's Z, and Golcher's VM, each of whose convergence has been previously argued empirically. Furthermore, we introduce entropy H, and a measure, r, related to the scale-free property of language. Our experiments show that both K and VM are convergent for texts in various languages, whereas the other measures are not.

Information related to the author
© 2014 The Association for Natural Language Processing
Previous article Next article
feedback
Top