Journal of Natural Language Processing

Preface

[title in Japanese]

[in Japanese]

2010Volume 17Issue 2 Pages 2_1-2_2
Published: 2010
Released on J-STAGE: June 23, 2011

DOIhttps://doi.org/10.5715/jnlp.17.2_1

JOURNAL FREE ACCESS

Download PDF (135K)

Paper

An Application of Related Term Extraction to Transliteration into Chinese

HaiXiang Huang, Atsushi Fujii

2010Volume 17Issue 2 Pages 2_3-2_24
Published: 2010
Released on J-STAGE: June 23, 2011

DOIhttps://doi.org/10.5715/jnlp.17.2_3

JOURNAL FREE ACCESS

Show abstractHide abstract

To transliterate foreign words, in Japanese and Korean, phonograms such as Katakana and Hangul are used. In Chinese, the pronunciation of a source word is spelled out with Kanji characters. However, because Kanji comprises ideograms, different characters are associated with the same pronunciation but can potentially convey different meanings and impressions. To select appropriate Kanji characters, an existing method requests a user to provide one or more related terms, but it is expensive. In this paper, we propose a method to select characters in transliteration into Chinese using related terms automatically extracted from the World Wide Web. We show the effectiveness of our method experimentally.

View full abstract

Download PDF (854K)
Annotating Predicate-Argument Relations and Anaphoric Relations: Findings from the Building of the NAIST Text Corpus

Ryu Iida, Mamoru Komachi, Naoya Inoue, Kentaro Inui, Yuji Matsumoto

2010Volume 17Issue 2 Pages 2_25-2_50
Published: 2010
Released on J-STAGE: June 23, 2011

DOIhttps://doi.org/10.5715/jnlp.17.2_25

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper addresses how to annotate predicate-argument and anaphoric relations in Japanese written text. Predicate-argument structure analysis and anaphora resolution are important problems because they bridge the gap between basic techniques in NLP such as morpho-syntactic analysis and NLP applications. To solve these problems, researchers have generally made use of annotated corpora for machine learning-based approaches. Although we need large corpora where predicate-argument relations and anaphoric relations are annotated to examine their occurrence in Japanese text, there have been no such resources so far. In addition, existing specifications for annotating predicate-argument and anaphoric relations are not directly applicable due to the difference of languages and different problem settings. For these reasons, we explore how to annotate these two kinds of relations and then define an adequate specification of each annotation task. In this article, we report the results of annotation, taking the Kyoto Corpus 3.0 as a starting point. Furthermore, we refine our annotating specification to adapt actual phenomena existing in our corpus and then report the results of the annotation work according to the new specification.

View full abstract

Download PDF (558K)
A Written Child Corpus with Editing History Tags

Ryo Nagata, Ayako Kawai, Koji Suda, Junichi Kakegawa, Koichiro Morihir ...

2010Volume 17Issue 2 Pages 2_51-2_65
Published: 2010
Released on J-STAGE: June 23, 2011

DOIhttps://doi.org/10.5715/jnlp.17.2_51

JOURNAL FREE ACCESS

Show abstractHide abstract

Corpora have played a crucial role in natural language processing and linguistics. However, there have been very few corpora consisting of the writing of children because of difficulties peculiar to child corpus creation. In this paper, we propose a method for avoiding the difficulties and efficiently creating a child corpus. We have used the proposed method to create a child corpus to show its effectiveness. As a result, we have obtained a child corpus called Kodomo Corpus containing 39,269 morphemes, which is the largest written child corpus. Kodomo Corpus has a feature that the editing histories such as addition and deletion are traceable through its data tags.

View full abstract

Download PDF (462K)

Register with J-STAGE for free!