Information and Media Technologies
Online ISSN : 1881-0896
ISSN-L : 1881-0896
Media (processing) and Interaction
Study on Constants of Natural Language Texts
Daisuke KimuraKumiko Tanaka-Ishii
Author information
JOURNAL FREE ACCESS

2014 Volume 9 Issue 4 Pages 771-789

Details
Abstract

This paper considers different measures that might become constants for any length of a given natural language text. Such measures indicate a potential for studying the complexity of natural language but have previously only been studied using relatively small English texts. In this study, we consider measures for texts in languages other than English, and for large-scale texts. Among the candidate measures, we consider Yule's K, Orlov's Z, and Golcher's VM, each of whose convergence has been previously argued empirically. Furthermore, we introduce entropy H, and a measure, r, related to the scale-free property of language. Our experiments show that both K and VM are convergent for texts in various languages, whereas the other measures are not.

Content from these authors
© 2014 The Association for Natural Language Processing
Previous article Next article
feedback
Top