Mathematical Linguistics

Show abstractHide abstract

The Frequency Dictionary of Old Japanese records frequencies of every word in 14 Japanese classical texts. In this paper, using the data in this dictionary, we try to find for each text words that are characteristic to that text as compared to the other 13 texts, that is, we identify words that are both used frequently in a given text and infrequently in all others. In order to find characteristic words statistically, we used the log-likelihood ratio to calculate the exact degree of characteristicness for each word. For each text the set of 20 words with the highest degree (the characteristic set) and the set of 20 words with the lowest degree (which we provisionally name “anti- characteristic”) are presented in this report. The 14 texts are characterized on the basis of their respective characteristic and anti-characteristic word sets. We further confirm the influence of text-length on calculating degrees of characteristicness, and carry out an experiment to decrease its influence on these calculations.

View full abstract

Download PDF (820K)

Register with J-STAGE for free!