Mathematical Linguistics
Online ISSN : 2433-0302
Print ISSN : 0453-4611
Special issues: Mathematical Linguistics
Volume 30, Issue 6
Special Issue on "Web Corpus"
Displaying 1-5 of 5 articles from this issue
Special Issue on "Web Corpus"
Foreword
Invited Paper (A) to the Special Issue
  • Web Corpora as a Source of Information for Etymological Studies
    Tadaharu Tanomura
    Article type: Invited Paper (A)
    2016 Volume 30 Issue 6 Pages 326-343
    Published: September 30, 2016
    Released on J-STAGE: May 01, 2024
    JOURNAL OPEN ACCESS
    The defining condition of a Web corpus will be that it is a huge amount of text data collected from the Internet. Although Websites such as Google Books, National Diet Library Digital Collections and newspaper archives do not satisfy the condition, they nevertheless cannot be clearly distinguished from typical Web corpora, and thus it may not be groundless to regard them as a type of Web corpus. This article, drawing upon two case studies, will demonstrate that we can easily enhance the level of the description of the history of Japanese as well as Chinese terms of the modern era with the help of information obtainable from those Websites.
    Download PDF (2440K)
Paper (B) to the Special Issue
  • Makiko Matsuda
    Article type: Paper (B)
    2016 Volume 30 Issue 6 Pages 344-356
    Published: September 30, 2016
    Released on J-STAGE: May 01, 2024
    JOURNAL OPEN ACCESS
    Japanese linguists have proposed a variety of Japanese deep cases from the semantic perspective. However, few people have investigated the adequacy of these cases using corpus-based analyses. This study investigates the frequency of four particles (Ga, Wo, De, No) by annotating the deep cases of these particles using the Japanese web n-gram corpus (7 gram). Results indicate that the most frequently appearing deep cases of Ga are “unaccusative-intransitive” (32% in type) and “objective” (type = 27%). “Agent” did not appear much (type = 8%), although they are considered one of the prototypical deep cases of Ga. “Object (act-on)” cases appear the most frequently in Wo (type = 80%) and they become more than 90% with “object (act-cause) cases. “Starting point” and “route” cases did not appear at all (type =0%), despite their semantic and grammatical uniqueness. The most frequently appearing deep cases of De were “others” (type = 51%) and “means and material” (type = 29%). “Place” cases appeared only 11% though they are considered a prototypical deep case. “Limitation and Modification” is the deep case that most frequently appeared in No (type = 47%), while prototypical deep case “Possessive” is not frequent (type = 3%). Although the Web influences these results, they may nonetheless provide useful insights for the study of Japanese deep cases.
    Download PDF (1121K)
General Topics
Resource
  • Its Application to Linguistic Count Data
    Masahiko Ishii
    Article type: Resource
    2016 Volume 30 Issue 6 Pages 357-377
    Published: September 30, 2016
    Released on J-STAGE: May 01, 2024
    JOURNAL OPEN ACCESS
    In this essay, I introduce "ridit analysis" commonly used in the field of medical statistics, and show that it is also useful for the analysis of linguistic count data. The ridit analysis is originally a statistical method for comparing among groups in a frequency table by means of the average ridit value. The average ridit is extremely easy to calculate and available for the statistical significance test. By applying the ridit analysis to the count data of some previous researches in Japanese linguistics, it is revealed that the analysis is applicable not only to the comparison among groups of qualitative or quantitative data but also to the quantification of each ordinal variable in a two-dimensional contingency table. The ridit analysis that is also regarded as one of the methods of exploratory data analysis (EDA) has a certain effectiveness in quantitative linguistic researches.
    Download PDF (1623K)
Tutorial
  • Naoki Hayashi
    Article type: Tutorial
    2016 Volume 30 Issue 6 Pages 378-390
    Published: September 30, 2016
    Released on J-STAGE: May 01, 2024
    JOURNAL OPEN ACCESS
    In this paper, I describe how to create a graph using the statistical software R. When creating a graph, rather than simply pasting the initial output, we alternately list commands to add various changes so that we are able to plot the final output or results. To explain this process, I use a dendrogram. The dendrogram describes how various kinds of information from the initial output can be changed from the command line. It also explains the method of plotting the output or resultant graph onto a Word document. Finally, I also provide a list of points to note when creating a chart in R.
    Download PDF (1220K)
feedback
Top