GENGO KENKYU (Journal of the Linguistic Society of Japan)
Online ISSN : 2185-6710
Print ISSN : 0024-3914
Featured There: Corpus-based Linguistic Analysis (1)
Japanese Corpora and Their Lexicographic Applications, with Special Emphasis on Collocation
Tadaharu Tanomura
Author information
JOURNAL FREE ACCESS

2010 Volume 138 Pages 1-23

Details
Abstract

Although Japanese has been lagging behind the other major languages of the world in the utilization of electronic corpora in linguistic studies, the situation is changing rapidly due to several factors including, notably, the ongoing construction of a balanced corpus of the language at the National Institute for Japanese Language and Linguistics.

This paper focuses on collocation, a linguistic phenomenon which can be analyzed reliably only by using large corpora, and explores the possible roles which corpora may play in the compilation of a dictionary of Japanese, be it a dictionary of an ordinary kind or a collocational dictionary. The three collocational aspects of Japanese examined by way of corpus analysis are: 1) the concept of ‘circumcollocate’, 2) the degree of markedness of verbs and adjectives, and 3) the semantic differences between synonymous idiomatic grammatical phrases. The paper will demonstrate the ways in which corpora may have lexicographic significance in each of those domains.

A large corpus is required for the retrieval of collocational information. The paper uses a Web corpus, constructed by the author in 2008, which consists of approximately 75 billion characters. This is equivalent to 150 gigabytes in file size, or three to four hundred thousand Japanese novel books of average size.

Content from these authors
© 2010 The Linguistic Society of Japan, Authors
Next article
feedback
Top