Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 12, Issue 1
Displaying 1-7 of 7 articles from this issue
  • [in Japanese]
    2005 Volume 12 Issue 1 Pages 1-2
    Published: January 10, 2005
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (276K)
  • TOMOYA NORO, TAIICHI HASHIMOTO, TAKENOBU TOKUNAGA, HOZUMI TANAKA
    2005 Volume 12 Issue 1 Pages 3-32
    Published: January 10, 2005
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Although large-scale grammars are prerequisite for parsing a great variety of sentences, it is difficult to build such grammars by hand.Yet, it is possible to derive a context-free grammar (CFG) automatically from an existing large-scale, syntactically annotated corpus.While seemingly a simple task, CFGs derived in such fashion have seldom been applied to existing systems.This is probably due to a great number of possible parse results (i.e.high ambiguity).In this paper, we analyze some causes of high ambiguity, and we propose a policy for building a large-scale Japanese CFG for syntactic parsing, capable of decreasing ambiguity.We also provide an experimental evaluation of the obtained CFG showing reduction in the number of parse results (reduced ambiguity) created by the CFG and the improved parsing accuracy.
    Download PDF (4348K)
  • Mohammad Teduh Uliniansyaht, Shun Ishizaki
    2005 Volume 12 Issue 1 Pages 33-50
    Published: January 10, 2005
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    It is common that a word in any natural language has often more than one meaning/sense. A word sense disambiguation (WSD) system is designed to determine which one of the senses of a polysemous word is invoked in a particular context around the word. We propose methods to disambiguate senses of polysemous words by using Naive Bayesian classifier method. A few sets of experiment data were taken from Kompas daily newspaper homepage and used for the system construction. We modified the original algorithm of Naive Bayesian method to apply it to the Indonesian language analysis. The experiments showed that our system achieved good accuracies (73-99%).
    Download PDF (1791K)
  • KAZUHIDE YAMAMOTO, YASUAKI ADACHI
    2005 Volume 12 Issue 1 Pages 51-78
    Published: January 10, 2005
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We present a method of summarizing the minutes of the national Diet.The minutes have some peculiar traits.For example, honorifics appear frequently and it includes both speech traits and document traits.In this paper, we focus attention on those traits, and paraphrase or delete specific expressions.We paraphrased honorifics that appear frequently in the minutes.Similarly, we presumed redundant parts using frequently-appeared expressions and several clue words, and deleted those parts.As a result of applying these processes to the minutes including spontaneous speech, we attains about 80 % summarization rate.We experimented to CSJ spoken language corpus using our system, result of about 84% summarization rate is obtained.These results tell us that the proposed approach works well not only for the minutes but for other spoken language expressions.
    Download PDF (3003K)
  • YASUHIRO TOKUNAGA, KENTARO INUI, YUJI MATSUMOTO
    2005 Volume 12 Issue 1 Pages 79-105
    Published: January 10, 2005
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper proposes a computational model for analyzing the communicative structure of computer-mediated chat dialogues, reporting the present results of our empirical evaluation.We first formalize communicative structure underlying chat dialogues by decomposing it into continuation relations and response relations.A continuation relation holds between utterences of the same speaker that constitute a complete chunk functioning as a question response, etc.(e.g. the relation between the separate utterences Are and you a student?, which constitute a question).A response relations, on the other hand, holds between utterances, e.g.a question and its response, made by different speakers.Our model analyzes communicative structure by grouping utterances together according to these types of relations in a bottom-up fashion. For this process, we use corpus-based supervised machine learning. We manually annotated a chat dialogue corpus with communicative structure (two-person and three-person dialogues: 69 dialogues in total, containing 11, 905 utterance tokens).The automatic analyses matched the manual analyses in 87.4% for two-person dialogues and 84.6% for three.
    Download PDF (2987K)
  • TAKESHI ABEKAWA, MANABU OKUMURA
    2005 Volume 12 Issue 1 Pages 107-123
    Published: January 10, 2005
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this paper, we propose a new method of analyzing Japanese relative clauses. Japanese relative clause modification should be classified into at least two major semantic categories: case-slot gapping and head restrictive.In previous methods, only the information for judging a clause to be case-slot gapping, and cooccurrence information between nouns and verbs is taken into account.Our proposed method also takes into account the information for judging a clause to be head restrictive. From the result of experiments, we could find that it yielded higher accuracy than previous methods.
    Download PDF (1864K)
  • KEIJI SHINZATO, KENTARO TORISAWA
    2005 Volume 12 Issue 1 Pages 125-150
    Published: January 10, 2005
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper describes an automatic acquisition method for hyponymy relations.Hyponymy relations play a crucial role in various natural language processing systems, and there have been many attempts to automatically acquire the relations from largescale corpora.Most of the existing acquisition methods rely on particular linguistic patterns, such as juxtapositions, which specify hyponymy relations.Our method, however, does not use such linguistic patterns.We try to acquire hyponymy relations from four different types of clues.The first is repetitions of HTML tags found in usual HTML documents on the WWW.The second is statistical measures such as df and idf, which are popular in IR literatures.The third is verb-noun cooccurrences found in normal corpora.The fourth is heuristic rules obtained through our experiments on a development set.
    Download PDF (2995K)
feedback
Top