Journal of Natural Language Processing

Preface

[title in Japanese]

[in Japanese]

2010Volume 17Issue 5 Pages 5_1
Published: 2010
Released on J-STAGE: April 15, 2011

DOIhttps://doi.org/10.5715/jnlp.17.5_1

JOURNAL FREE ACCESS

Download PDF (105K)

Paper

Creation of Sentiment Corpus by Multiple Annotators with an Annotation Tool that has a Function of Referring Example Annotations

Rintaro Miyazaki, Tatsunori Mori

2010Volume 17Issue 5 Pages 5_3-5_50
Published: 2010
Released on J-STAGE: April 15, 2011

DOIhttps://doi.org/10.5715/jnlp.17.5_3

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we investigated an effective way of creation of a sentiment corpus by using manual annotation. Sentiment corpora are regarded as indispensable resource in related tasks of sentiment information processing. First, we proposed a two-layered model of structure of sentiment information, each of which is a tuple of four kinds of elements, i.e., ‘item’, ‘attribute’, ‘value’, and ‘evaluation’. Second, We investigated corpus annotation by multiple annotators. In this situation, a corpus to be annotated is divided into multiple parts, and they are assigned to multiple annotators. Result of preliminary experiment shows that agreement of annotation among multiple annotators is not enough when annotators perform annotation individually. We proposed utilization of example annotations in the corpus of manual annotation, and a tool for supporting manual annotation based on example annotation. The proposed tool offers annotators with a way to moderately share the criterion of annotation by referring existing annotations that have been already performed. Our experimental result shows that referring example annotations is effective to control discrepancy in annotation. Third, we reported an experiment of creating a sentiment corpus by using the tool. After confirming the effectiveness, ten annotators annotated a corpus of review text that contains of 10,000 sentences in order to create a corpus of sentiment information. According to a statistical investigation of each element in the annotated corpus, the proposed two-layered model is effective to analyze sentiment information in text more precisely.

View full abstract

Download PDF (960K)
JDMWE: A Japanese Dictionary of Multi-Word Expressions

Kosho Shudo, Toshifumi Tanabe

2010Volume 17Issue 5 Pages 5_51-5_74
Published: 2010
Released on J-STAGE: April 15, 2011

DOIhttps://doi.org/10.5715/jnlp.17.5_51

JOURNAL FREE ACCESS

Show abstractHide abstract

Since (Sag et al. 2002) is presented, the NLP society has been aware that one of the most crucial problems in NLP is how to cope with idiosyncratic multiword expressions, which occur in authentic sentences with unexpectedly high frequency. Here, the idiosyncrasy of expression is twofold in principle; one is idiomaticity, i.e. non-compositionality of meaning and the other is the strong probabilistic boundness of word combination. Thus, many trials to extract those expressions from corpora by using mostly statistical method have been made in NLP field. However, presumably because of the difficulty with their correct extraction without human insight, no reliable, extensive resource has yet been available. Authors recognized the crucial importance of such irregular expressions in around 1970 and started to develop a machine dictionary which contains Japanese idioms, idiom-like expressions and other multiword expressions which consist of frequently co-occurring words. In this paper, we give an overview of the first version of the dictionary, namely JDMWE (Japanese Dictionary of Multi-Word Expressions). It has about 104,000 head entries and is characterized by;
1. the wide notational, syntactic and semantic variety of contained expressions,
2. the syntactic function and structure given for each entry expression and
3. the possibility of internal modification indicated for each component word of the entry expression.

View full abstract

Download PDF (779K)

Report

Corpus of Texts Composed by Japanese Elementary School Children and its Application in Linguistics and Sociology

Maki Sakamoto

2010Volume 17Issue 5 Pages 5_75-5_98
Published: 2010
Released on J-STAGE: April 15, 2011

DOIhttps://doi.org/10.5715/jnlp.17.5_75

JOURNAL FREE ACCESS

Show abstractHide abstract

One of the problems in corpus-based Japanese linguistics is a shortage of shared linguistic corpus written by Japanese children. Written language corpus of Japanese children shared as language resources would enable us to analyze a change of the Japanese use according to age or examine words and grammatical style characteristically used by children. Such corpus would be expected to contribute to a Japanese study or Japanese education as well as related fields such as cognitive development and sociology. Therefore, in this study we collected essays written by Japanese elementary school children shown in Websites of 265 elementary schools. As a result we collected 10,006 texts, about 1.23 million words. Using date collected through this process, we investigated how children in each school year used mimetic words. Results showed that the amount of mimetic expressions rose to a third grader, the kinds of onomatopoeic expressions increased to a second grader, and then they were dropping. Furthermore, as a sociologically applied study, we investigated what children wrote about their parents and how they reacted to the exchanges with parents. Result showed that the reaction of a child was rich and strong in the case of mother.

View full abstract

Download PDF (504K)

Register with J-STAGE for free!