Mathematical Linguistics

Special Issue on "Web Corpus"

Foreword

Foreword to the Special Issue of "Web Corpora"

Yukari Tanaka

Article type: Foreword
2016 Volume 30 Issue 6 Pages 325
Published: September 20, 2016
Released on J-STAGE: May 01, 2024

DOIhttps://doi.org/10.24701/mathling.30.6_325

JOURNAL OPEN ACCESS

Download PDF (95K)

Invited Paper (A) to the Special Issue

The Concept, Types and Utility of Web Corpora

Web Corpora as a Source of Information for Etymological Studies

Tadaharu Tanomura

Article type: Invited Paper (A)
2016 Volume 30 Issue 6 Pages 326-343
Published: September 30, 2016
Released on J-STAGE: May 01, 2024

DOIhttps://doi.org/10.24701/mathling.30.6_326

JOURNAL OPEN ACCESS

Show abstractHide abstract

The defining condition of a Web corpus will be that it is a huge amount of text data collected from the Internet. Although Websites such as Google Books, National Diet Library Digital Collections and newspaper archives do not satisfy the condition, they nevertheless cannot be clearly distinguished from typical Web corpora, and thus it may not be groundless to regard them as a type of Web corpus. This article, drawing upon two case studies, will demonstrate that we can easily enhance the level of the description of the history of Japanese as well as Chinese terms of the modern era with the help of information obtainable from those Websites.

View full abstract

Download PDF (2440K)

Paper (B) to the Special Issue

A Corpus-Based Study of Deviation Frequencies in Japanese Deep Cases

Makiko Matsuda

Article type: Paper (B)
2016 Volume 30 Issue 6 Pages 344-356
Published: September 30, 2016
Released on J-STAGE: May 01, 2024

DOIhttps://doi.org/10.24701/mathling.30.6_344

JOURNAL OPEN ACCESS

Show abstractHide abstract

Japanese linguists have proposed a variety of Japanese deep cases from the semantic perspective. However, few people have investigated the adequacy of these cases using corpus-based analyses. This study investigates the frequency of four particles (Ga, Wo, De, No) by annotating the deep cases of these particles using the Japanese web n-gram corpus (7 gram). Results indicate that the most frequently appearing deep cases of Ga are “unaccusative-intransitive” (32% in type) and “objective” (type = 27%). “Agent” did not appear much (type = 8%), although they are considered one of the prototypical deep cases of Ga. “Object (act-on)” cases appear the most frequently in Wo (type = 80%) and they become more than 90% with “object (act-cause) cases. “Starting point” and “route” cases did not appear at all (type =0%), despite their semantic and grammatical uniqueness. The most frequently appearing deep cases of De were “others” (type = 51%) and “means and material” (type = 29%). “Place” cases appeared only 11% though they are considered a prototypical deep case. “Limitation and Modification” is the deep case that most frequently appeared in No (type = 47%), while prototypical deep case “Possessive” is not frequent (type = 3%). Although the Web influences these results, they may nonetheless provide useful insights for the study of Japanese deep cases.

View full abstract

Download PDF (1121K)

General Topics

Resource

Ridit Analysis

Its Application to Linguistic Count Data

Masahiko Ishii

Article type: Resource
2016 Volume 30 Issue 6 Pages 357-377
Published: September 30, 2016
Released on J-STAGE: May 01, 2024

DOIhttps://doi.org/10.24701/mathling.30.6_357

JOURNAL OPEN ACCESS

Show abstractHide abstract

In this essay, I introduce "ridit analysis" commonly used in the field of medical statistics, and show that it is also useful for the analysis of linguistic count data. The ridit analysis is originally a statistical method for comparing among groups in a frequency table by means of the average ridit value. The average ridit is extremely easy to calculate and available for the statistical significance test. By applying the ridit analysis to the count data of some previous researches in Japanese linguistics, it is revealed that the analysis is applicable not only to the comparison among groups of qualitative or quantitative data but also to the quantification of each ordinal variable in a two-dimensional contingency table. The ridit analysis that is also regarded as one of the methods of exploratory data analysis (EDA) has a certain effectiveness in quantitative linguistic researches.

View full abstract

Download PDF (1623K)

Tutorial

Data Visualization (6):Making Dendrogram in R statics

Naoki Hayashi

Article type: Tutorial
2016 Volume 30 Issue 6 Pages 378-390
Published: September 30, 2016
Released on J-STAGE: May 01, 2024

DOIhttps://doi.org/10.24701/mathling.30.6_378

JOURNAL OPEN ACCESS

Show abstractHide abstract

In this paper, I describe how to create a graph using the statistical software R. When creating a graph, rather than simply pasting the initial output, we alternately list commands to add various changes so that we are able to plot the final output or results. To explain this process, I use a dendrogram. The dendrogram describes how various kinds of information from the initial output can be changed from the command line. It also explains the method of plotting the output or resultant graph onto a Word document. Finally, I also provide a list of points to note when creating a chart in R.

View full abstract

Download PDF (1220K)

Register with J-STAGE for free!