Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 17, Issue 5
Displaying 1-4 of 4 articles from this issue
Preface
Paper
  • Rintaro Miyazaki, Tatsunori Mori
    2010 Volume 17 Issue 5 Pages 5_3-5_50
    Published: 2010
    Released on J-STAGE: April 15, 2011
    JOURNAL FREE ACCESS
    In this paper, we investigated an effective way of creation of a sentiment corpus by using manual annotation. Sentiment corpora are regarded as indispensable resource in related tasks of sentiment information processing. First, we proposed a two-layered model of structure of sentiment information, each of which is a tuple of four kinds of elements, i.e., ‘item’, ‘attribute’, ‘value’, and ‘evaluation’. Second, We investigated corpus annotation by multiple annotators. In this situation, a corpus to be annotated is divided into multiple parts, and they are assigned to multiple annotators. Result of preliminary experiment shows that agreement of annotation among multiple annotators is not enough when annotators perform annotation individually. We proposed utilization of example annotations in the corpus of manual annotation, and a tool for supporting manual annotation based on example annotation. The proposed tool offers annotators with a way to moderately share the criterion of annotation by referring existing annotations that have been already performed. Our experimental result shows that referring example annotations is effective to control discrepancy in annotation. Third, we reported an experiment of creating a sentiment corpus by using the tool. After confirming the effectiveness, ten annotators annotated a corpus of review text that contains of 10,000 sentences in order to create a corpus of sentiment information. According to a statistical investigation of each element in the annotated corpus, the proposed two-layered model is effective to analyze sentiment information in text more precisely.
    Download PDF (960K)
  • Kosho Shudo, Toshifumi Tanabe
    2010 Volume 17 Issue 5 Pages 5_51-5_74
    Published: 2010
    Released on J-STAGE: April 15, 2011
    JOURNAL FREE ACCESS
    Since (Sag et al. 2002) is presented, the NLP society has been aware that one of the most crucial problems in NLP is how to cope with idiosyncratic multiword expressions, which occur in authentic sentences with unexpectedly high frequency. Here, the idiosyncrasy of expression is twofold in principle; one is idiomaticity, i.e. non-compositionality of meaning and the other is the strong probabilistic boundness of word combination. Thus, many trials to extract those expressions from corpora by using mostly statistical method have been made in NLP field. However, presumably because of the difficulty with their correct extraction without human insight, no reliable, extensive resource has yet been available. Authors recognized the crucial importance of such irregular expressions in around 1970 and started to develop a machine dictionary which contains Japanese idioms, idiom-like expressions and other multiword expressions which consist of frequently co-occurring words. In this paper, we give an overview of the first version of the dictionary, namely JDMWE (Japanese Dictionary of Multi-Word Expressions). It has about 104,000 head entries and is characterized by;
    1. the wide notational, syntactic and semantic variety of contained expressions,
    2. the syntactic function and structure given for each entry expression and
    3. the possibility of internal modification indicated for each component word of the entry expression.
    Download PDF (779K)
Report
  • Maki Sakamoto
    2010 Volume 17 Issue 5 Pages 5_75-5_98
    Published: 2010
    Released on J-STAGE: April 15, 2011
    JOURNAL FREE ACCESS
    One of the problems in corpus-based Japanese linguistics is a shortage of shared linguistic corpus written by Japanese children. Written language corpus of Japanese children shared as language resources would enable us to analyze a change of the Japanese use according to age or examine words and grammatical style characteristically used by children. Such corpus would be expected to contribute to a Japanese study or Japanese education as well as related fields such as cognitive development and sociology. Therefore, in this study we collected essays written by Japanese elementary school children shown in Websites of 265 elementary schools. As a result we collected 10,006 texts, about 1.23 million words. Using date collected through this process, we investigated how children in each school year used mimetic words. Results showed that the amount of mimetic expressions rose to a third grader, the kinds of onomatopoeic expressions increased to a second grader, and then they were dropping. Furthermore, as a sociologically applied study, we investigated what children wrote about their parents and how they reacted to the exchanges with parents. Result showed that the reaction of a child was rich and strong in the case of mother.
    Download PDF (504K)
feedback
Top