Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 17, Issue 2
Displaying 1-4 of 4 articles from this issue
Preface
Paper
  • HaiXiang Huang, Atsushi Fujii
    2010 Volume 17 Issue 2 Pages 2_3-2_24
    Published: 2010
    Released on J-STAGE: June 23, 2011
    JOURNAL FREE ACCESS
    To transliterate foreign words, in Japanese and Korean, phonograms such as Katakana and Hangul are used. In Chinese, the pronunciation of a source word is spelled out with Kanji characters. However, because Kanji comprises ideograms, different characters are associated with the same pronunciation but can potentially convey different meanings and impressions. To select appropriate Kanji characters, an existing method requests a user to provide one or more related terms, but it is expensive. In this paper, we propose a method to select characters in transliteration into Chinese using related terms automatically extracted from the World Wide Web. We show the effectiveness of our method experimentally.
    Download PDF (854K)
  • Ryu Iida, Mamoru Komachi, Naoya Inoue, Kentaro Inui, Yuji Matsumoto
    2010 Volume 17 Issue 2 Pages 2_25-2_50
    Published: 2010
    Released on J-STAGE: June 23, 2011
    JOURNAL FREE ACCESS
    This paper addresses how to annotate predicate-argument and anaphoric relations in Japanese written text. Predicate-argument structure analysis and anaphora resolution are important problems because they bridge the gap between basic techniques in NLP such as morpho-syntactic analysis and NLP applications. To solve these problems, researchers have generally made use of annotated corpora for machine learning-based approaches. Although we need large corpora where predicate-argument relations and anaphoric relations are annotated to examine their occurrence in Japanese text, there have been no such resources so far. In addition, existing specifications for annotating predicate-argument and anaphoric relations are not directly applicable due to the difference of languages and different problem settings. For these reasons, we explore how to annotate these two kinds of relations and then define an adequate specification of each annotation task. In this article, we report the results of annotation, taking the Kyoto Corpus 3.0 as a starting point. Furthermore, we refine our annotating specification to adapt actual phenomena existing in our corpus and then report the results of the annotation work according to the new specification.
    Download PDF (558K)
  • Ryo Nagata, Ayako Kawai, Koji Suda, Junichi Kakegawa, Koichiro Morihir ...
    2010 Volume 17 Issue 2 Pages 2_51-2_65
    Published: 2010
    Released on J-STAGE: June 23, 2011
    JOURNAL FREE ACCESS
    Corpora have played a crucial role in natural language processing and linguistics. However, there have been very few corpora consisting of the writing of children because of difficulties peculiar to child corpus creation. In this paper, we propose a method for avoiding the difficulties and efficiently creating a child corpus. We have used the proposed method to create a child corpus to show its effectiveness. As a result, we have obtained a child corpus called Kodomo Corpus containing 39,269 morphemes, which is the largest written child corpus. Kodomo Corpus has a feature that the editing histories such as addition and deletion are traceable through its data tags.
    Download PDF (462K)
feedback
Top