Mathematical Linguistics

Special Issue 2018 on the "Linguistic Studies Using Multivariate Analysis"

Foreword

[title in Japanese]

Yukari Tanaka

2018Volume 31Issue 6 Pages 401
Published: September 20, 2018
Released on J-STAGE: September 20, 2019

DOIhttps://doi.org/10.24701/mathling.31.6_401

JOURNAL OPEN ACCESS

Download PDF (361K)

Invited Paper (A) to the Special Issue

The Generalized Linear Mixed Model (GLMM) Meets Variation Theory

Kenjiro Matsuda

2018Volume 31Issue 6 Pages 402-416
Published: September 20, 2018
Released on J-STAGE: September 20, 2019

DOIhttps://doi.org/10.24701/mathling.31.6_402

JOURNAL OPEN ACCESS

Show abstractHide abstract

Variation Theory, a subfield of sociolinguistics, underwent a scientific revolution in 2009, when a logistic regression analysis was replaced as an analytical tool by a generalized linear mixed model (GLMM). We first give a detailed theoretical and historical background of the revolution, followed by a demonstration of GLMM analysis that involves a reanalysis of the variation and change of the innovative potential suffix in Tokyo Japanese proposed by Matsuda (1993). Our GLMM reanalysis with two random intercepts (the speaker and the verbal stem) showed that, while the language internal (grammatical or phonological) factors mostly remained statistically significant, only a single explanatory variable, (normalized) age of the speaker, emerged as a significant external (social) factor. It also revealed that two verbal stems are exceptions in the sense that they behave quite differently from what is expected from the combinations of relevant factors. We conclude that the GLMM is better suited than the fixed-effects only logistic regression analysis to the studies of language variation and change, where overdispersion and large individual differences are more prevalent than generally assumed.

View full abstract

Download PDF (742K)

Invited Paper (B) to the Special Issue

Author Identification Based on Rate of Usage of Words before Period

Multivariate Analyses and Scoring

Wataru Zaitsu, Mingzhe Jin

2018Volume 31Issue 6 Pages 417-425
Published: September 20, 2018
Released on J-STAGE: September 20, 2019

DOIhttps://doi.org/10.24701/mathling.31.6_417

JOURNAL OPEN ACCESS

Show abstractHide abstract

This study examined the effectiveness of author identification on the basis of the rate of usage of words before periods across texts by 100 bloggers. We analyzed one suspected text, one control text, and irrelevant texts by four bloggers using four multivariate analyses: (1) principal components analysis, (2) correspondence analysis, (3) multi-dimensional scaling, and (4) hierarchical cluster analysis, and we gave scores based on the results of the multivariate analyses. This study set two conditions: “same author: the author of suspected and control texts were same” and “different author: the author of suspected and control texts were different”. The results of comparing score distributions between both groups indicated that the rate of usage of words before periods was effective for author identification, next to rate of usage of non-independent words and bigram of parts of speech.

View full abstract

Download PDF (546K)

Paper (B) to the Special Issue

Factors Involved in Placing Comma Immediately Following a Conjunction

Modeling and Evaluation by Using Elastic Net

Takuya Iwasaki

Article type: Paper (B) to the Special Issue
2018Volume 31Issue 6 Pages 426-442
Published: September 20, 2018
Released on J-STAGE: September 20, 2019

DOIhttps://doi.org/10.24701/mathling.31.6_426

JOURNAL OPEN ACCESS

Show abstractHide abstract

Commas in Japanese Sentences are used arbitrarily because orthographic rules haven't permeated through the Japanese society. In this paper, I attempt to reveal the reasons and the factors of why and where to use comma in Japanese sentences. The prediction model used in this paper is constructed from "The Balanced Corpus of Contemporary Written Japanese (BCCWJ)" core data by using the generalized linear models with the Elastic Net, which is a dynamic blending of lasso and ridge regression. I ran the cross-validation with 10 folds for assessing the model quality and in this way, we can prevent the overfitting of the model. The model can also deal with variables with a large amount of information. By reclassifying the original data using this constructed model, the result is that the recall ratio came out to be 78.99%. In conclusion, I claim in this paper that the strongest indicators according to the coefficient plot of how to place a comma after a conjunction are in the following situations: 1. After the lexeme de. 2. When a conjunction is at the beginning of the sentence. 3. After the lexeme ga. 4. After the lexeme shikashinagara. 5. Also, commas are used more frequently in the register "white papers".

View full abstract

Download PDF (566K)

Special Issue 2017 on the "Grammar and Quantitative Study"

Comparing Frequencies of Japanese Intransitive/Transitive 36 Paired Verbs

Katsuo Tamaoka, Jingyi ZHANG, Shogo makioka

2018Volume 31Issue 6 Pages 443-460
Published: September 20, 2018
Released on J-STAGE: September 20, 2019

DOIhttps://doi.org/10.24701/mathling.31.6_443

JOURNAL OPEN ACCESS

Show abstractHide abstract

Japanese is described as the naru (‘become’) language (e.g., Ikegami 1981, 2006), which predicts that intransitive verbs are more frequently used than transitive verbs. The present study, therefore, selected 36 intransitive-and-transitive paired verbs and compared their frequencies using 18 years (1998-2015) of news articles from the Mainichi Newspaper. A t-test analysis revealed no overall difference in the frequencies between intransitive and transitive verbs as well as no differences within the five forms of these verbs (i.e., infinitive, adverbial, conditional, imperative and predicative). The same t-tests conducted on the frequencies transformed by loge(x+0.5) also indicated no differences except for the imperative form, indicating a reverse direction for the prediction made by the feature of naru language since transitive verbs are more frequently used in the imperative form than intransitive verbs. A correlation of frequencies between intransitive and transitive verbs was also very high (r=0.70, p<.001), showing great similarity between the two types of verbs. The present study thus demonstrated that both intransitive and transitive verbs display overall similarity in frequencies, possibly due to a great variety of usages for these paired verbs.

View full abstract

Download PDF (834K)

Tutorial

Recent Methods of the Questionnaire Research with the Large Sample Size (4): Issue of Purposive Sampling Method

Yasue nakato

Article type: Tutorial
2018Volume 31Issue 6 Pages 461-476
Published: September 20, 2018
Released on J-STAGE: September 20, 2019

DOIhttps://doi.org/10.24701/mathling.31.6_461

JOURNAL OPEN ACCESS

Show abstractHide abstract

The issue of selecting samples, selecting speakers or respondents of linguistic surveys, is very important in sociolinguistic research. This paper discusses the problems and significance of purposive (non-probability) sampling method, which is most often used in Japanese language surveys, mainly in comparison with random sampling. Data collected with use of purposive sampling do not represent the whole target population statistically, but it is meaningful and valuable as a case study, especially in the study of dialectology where random sampling does not apply. In addition to sampling procedures, the issues of personal information protection policy, rapid increase of foreign residents, ethnicity and gender reflected in language use also need to be considered in methodology of linguistic surveys.

View full abstract

Download PDF (418K)

Register with J-STAGE for free!