Several studies have classified Japanese dialects into multiple categories based on various attributes such as vocabulary, grammar, and phonology. Although analyses based on structural linguistics systematic criteria have been conducted, it would be worthwhile to perform a classification that incorporates the frequency of actual language usage in real-life activities. Hence, in this study, the frequency of mora bigrams occurring in natural discourse was employed for creating a phylogenetic tree by applying the neighbor joining method, and each dialect was thereby classified. Consequently, the dialects were classified as eastern and western, with the northern Alps being considered as the segregating border between east and west regions. Both Gifu and Aichi dialects were classified as western. Furthermore, in order to examine the characteristics of mora bigrams occurring in eastern and western dialects, comparison tests were conducted. Obtained results show that "Nda, NeR, Daka, Qte, DaQ, Dayo, Gara" in western dialects and " NaR, MoR, Rte, Hoi, N-ya, N-ja, Soya" in western dialects can be extracted as characteristic mora bigrams.
This paper examines the stylometric similarities and differences between Natsume Soseki's unfinished last work (Meian) and its sequel (Zoku Meian), which is written by Mizumura Minae. In order to investigate the degree of similarity between the writing style of the two novels, and determine what Mizumura Minae especially focused on to achieve successful stylistic imitation, we conducted a quantitative analysis using hierarchical cluster analysis and chi-squared statistics. The result of the analysis indicated that, compared with other texts in the corpus, the sentences in Zoku Meian and Meian tend to be similar in length, vocabulary, part of speech, sentence structures, and the like. On the other hand, traces of Mizumura's stylistic features remained in Zoku Meian, and even though it is reminiscent of Soseki's novel, quantitative analysis revealed the differences between the two novels.
This paper reports the statistical information of the corpora developed by the Center for Corpus Development and installed in Chunagon. The corpora introduced in this article are Balanced Corpus of Contemporary Written Japanese (BCCSJ), Corpus of Spontaneous Japanese (CSJ) and Corpus of Historical Japanese (CHJ).