Expressions of “prefix 0+main verb+auxiliary verb” and “prefix GO+main verb+auxiliary verb” are important verbal-honorific expressions in the Japanese language. It has been pointed out in past linguistic researches that the difference between the two types of expressions is that the main verb after “O” is a Japanese word and the one after “GO” is a Chinese word. However, there have hardly been any quantitative researches made on the differences of the two expressions so far. In this study, quantitative analyses were performed to reveal differences in the im-pressions of politeness between these two types of expressions by using Scheffe's paired comparison method and statistical tests. Results suggest that in regard to difference in politeness from a plain form, “prefix GO+verb of Chinese word+aux-iliary verb, ” is smaller than “prefix O+verb of Japanese word+auxiliary verb.” It is suggested that these results are due to the difference between these expressions as to the recognition of honorific expressions.
A simpler distribution that fits empirical word distribution about as well as a negative binomial is the Katz K mixture.In the K mixture model, the basic assumption is that the conditional probabilities of repeats for a given word are determined by a constant decay factor that is independent of the number of occurrences which have taken place.However, the probabilities of the repeat occurrences are generally lower than the constant decay factor for the content-bearing words with few occurrences that have taken place.To solve this deficiency of the K mixture model, in-depth exploration of the characteristics of the conditional probabilities of repetitions, decay factors and their influences on modeling term distributions was conducted.Based on the results of this study, it appears that both ends of the distribution can be used to fit models.That is, not only can document frequencies be used when the instances of a word are few, but also tail probabilities (the accumulation of document frequencies). Both document frequencies for few instances of a word and tail probabilities for large instances are often relatively easy to estimate empirically.Therefore, we propose an effective approach for improving the K mixture model, where the decay factor is the combination of two possible decay factors interpolated by a function depending on the number of instances of a word in a document.Results show that the proposed model can generate a statistically significant better estimation of frequencies, especially the frequency estimation for a word with two instances in a document.In addition, it is shown that the advantages of this approach will become more evident in two cases, modeling the term distribution for the frequently used content-bearing word and modeling the term distribution for a corpus with a wide range of document length.