Variation Theory, a subfield of sociolinguistics, underwent a scientific revolution in 2009, when a logistic regression analysis was replaced as an analytical tool by a generalized linear mixed model (GLMM). We first give a detailed theoretical and historical background of the revolution, followed by a demonstration of GLMM analysis that involves a reanalysis of the variation and change of the innovative potential suffix in Tokyo Japanese proposed by Matsuda (1993). Our GLMM reanalysis with two random intercepts (the speaker and the verbal stem) showed that, while the language internal (grammatical or phonological) factors mostly remained statistically significant, only a single explanatory variable, (normalized) age of the speaker, emerged as a significant external (social) factor. It also revealed that two verbal stems are exceptions in the sense that they behave quite differently from what is expected from the combinations of relevant factors. We conclude that the GLMM is better suited than the fixed-effects only logistic regression analysis to the studies of language variation and change, where overdispersion and large individual differences are more prevalent than generally assumed.
This study examined the effectiveness of author identification on the basis of the rate of usage of words before periods across texts by 100 bloggers. We analyzed one suspected text, one control text, and irrelevant texts by four bloggers using four multivariate analyses: (1) principal components analysis, (2) correspondence analysis, (3) multi-dimensional scaling, and (4) hierarchical cluster analysis, and we gave scores based on the results of the multivariate analyses. This study set two conditions: “same author: the author of suspected and control texts were same” and “different author: the author of suspected and control texts were different”. The results of comparing score distributions between both groups indicated that the rate of usage of words before periods was effective for author identification, next to rate of usage of non-independent words and bigram of parts of speech.
Commas in Japanese Sentences are used arbitrarily because orthographic rules haven't permeated through the Japanese society. In this paper, I attempt to reveal the reasons and the factors of why and where to use comma in Japanese sentences. The prediction model used in this paper is constructed from "The Balanced Corpus of Contemporary Written Japanese (BCCWJ)" core data by using the generalized linear models with the Elastic Net, which is a dynamic blending of lasso and ridge regression. I ran the cross-validation with 10 folds for assessing the model quality and in this way, we can prevent the overfitting of the model. The model can also deal with variables with a large amount of information. By reclassifying the original data using this constructed model, the result is that the recall ratio came out to be 78.99%. In conclusion, I claim in this paper that the strongest indicators according to the coefficient plot of how to place a comma after a conjunction are in the following situations: 1. After the lexeme de. 2. When a conjunction is at the beginning of the sentence. 3. After the lexeme ga. 4. After the lexeme shikashinagara. 5. Also, commas are used more frequently in the register "white papers".
Japanese is described as the naru (‘become’) language (e.g., Ikegami 1981, 2006), which predicts that intransitive verbs are more frequently used than transitive verbs. The present study, therefore, selected 36 intransitive-and-transitive paired verbs and compared their frequencies using 18 years (1998-2015) of news articles from the Mainichi Newspaper. A t-test analysis revealed no overall difference in the frequencies between intransitive and transitive verbs as well as no differences within the five forms of these verbs (i.e., infinitive, adverbial, conditional, imperative and predicative). The same t-tests conducted on the frequencies transformed by loge(x+0.5) also indicated no differences except for the imperative form, indicating a reverse direction for the prediction made by the feature of naru language since transitive verbs are more frequently used in the imperative form than intransitive verbs. A correlation of frequencies between intransitive and transitive verbs was also very high (r=0.70, p<.001), showing great similarity between the two types of verbs. The present study thus demonstrated that both intransitive and transitive verbs display overall similarity in frequencies, possibly due to a great variety of usages for these paired verbs.
The issue of selecting samples, selecting speakers or respondents of linguistic surveys, is very important in sociolinguistic research. This paper discusses the problems and significance of purposive (non-probability) sampling method, which is most often used in Japanese language surveys, mainly in comparison with random sampling. Data collected with use of purposive sampling do not represent the whole target population statistically, but it is meaningful and valuable as a case study, especially in the study of dialectology where random sampling does not apply. In addition to sampling procedures, the issues of personal information protection policy, rapid increase of foreign residents, ethnicity and gender reflected in language use also need to be considered in methodology of linguistic surveys.