Mathematical Linguistics
Online ISSN : 2433-0302
Print ISSN : 0453-4611
Report
Characteristic Words in Japanese Classical Works
Tatuo MiyazimaAsuko Kondo
Author information
JOURNAL OPEN ACCESS

2011 Volume 28 Issue 3 Pages 94-105

Details
Abstract

The Frequency Dictionary of Old Japanese records frequencies of every word in 14 Japanese classical texts. In this paper, using the data in this dictionary, we try to find for each text words that are characteristic to that text as compared to the other 13 texts, that is, we identify words that are both used frequently in a given text and infrequently in all others. In order to find characteristic words statistically, we used the log-likelihood ratio to calculate the exact degree of characteristicness for each word. For each text the set of 20 words with the highest degree (the characteristic set) and the set of 20 words with the lowest degree (which we provisionally name “anti- characteristic”) are presented in this report. The 14 texts are characterized on the basis of their respective characteristic and anti-characteristic word sets. We further confirm the influence of text-length on calculating degrees of characteristicness, and carry out an experiment to decrease its influence on these calculations.

Content from these authors
© The Mathematical Linguistic Society of Japan

この記事はクリエイティブ・コモンズ [表示 - 非営利 - 改変禁止 4.0 国際]ライセンスの下に提供されています。
https://creativecommons.org/licenses/by-nc-nd/4.0/deed.ja
Previous article Next article
feedback
Top