Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 9, Issue 3
Displaying 1-8 of 8 articles from this issue
  • [in Japanese]
    2002 Volume 9 Issue 3 Pages 1-2
    Published: July 10, 2002
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (252K)
  • YOSHIMASA TSURUOKA, TAKASHI CHIKAYAMA
    2002 Volume 9 Issue 3 Pages 3-19
    Published: July 10, 2002
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    The decision list algorithm is one of the most successful algorithms for classification problems in natural language processing. The most important part of the decision list algorithm is the calculation of reliability for each rule, hence the estimation of probability for each contextual evidence. However, the majority of research efforts using decision lists do not think much of the estimation method. We propose an estimation method based on Bayesian learning which gives well-founded smoothing and better use of prior information on each type of contextual evidences. Experimental results obtained using Senseval-1 data set and Japanese pseudowords show that our method makes probability estimation more precise, leading to improvement of classification performance of the decision list algorithm.
    Download PDF (1576K)
  • SHIHO NOBESAWA, KENGO SATO, HIROAKI SAITO
    2002 Volume 9 Issue 3 Pages 21-40
    Published: July 10, 2002
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We propose two methods for the recognition of unknown strings in dictionary-based natural language processing systems. One method is for the dynamic use of statistical information during processing, and the other is for obtaining meaningful strings which should be added to the dictionary. Both methods are based on statistical information drawn from a training corpus, and there is no need for part-of-speech tagging or other preprocessing of the training corpus. We applied our methods to a Japanese morphological analysis system and had good results in reduction of unknown words and over segmentation.
    Download PDF (5321K)
  • HIROYUKI SAKAI, NAOTSUGU SHINOHARA, SHIGERU MASUYAMA, KAZUHIDE YAMAMOT ...
    2002 Volume 9 Issue 3 Pages 41-62
    Published: July 10, 2002
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper proposes a method of acquiring knowledge about the abbreviation possibility of verb phrases. In a certain clause containing a verb and including verb phrases, the proposed method extracts some clauses which contain the same verb and have different case postpositional particles from a large corpus. Then, our method recognizes verb phrases possible to be abbreviated by comparing the verb phrases with the verb phrases contained in the extracted clauses. In our method, the verb phrases containing important piece of information is hard to recognize as being possible to be abbreviated, and the verb phrases containing information which appear in previous sentences is easy to recognize as being possible to be abbreviated. The evaluation of our method by experiments shows that the precision is 78.0% and the recall is 67.9%.We compare our method with the method which recognizes verb phrases possible to be abbreviated by recognizing optional case elements described in a case frame dictionary as being possible to be abbreviated. By the evaluation results, we conclude that our method outperforms the method which recognizes verb phrases possible to be abbreviated by using a case frame dictionary.
    Download PDF (2439K)
  • KAZUHIRO SEKI, ATSUSHI FUJII, TETSUYA ISHIKAWA
    2002 Volume 9 Issue 3 Pages 63-85
    Published: July 10, 2002
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In Japanese, entities which can easily be predicted are often omitted. Identifying appropriate antecedents associated with those ellipses, which is termed “anaphora resolution”, is crucial in natural language processing, specifically, a discourse analysis. This paper proposes a probabilistic model to resolve zero pronouns, which are one of the major ellipses in Japanese. Our proposing model can be decomposed into two models associated with syntactic and semantic properties, so as to optimize a parameter estimation. A syntactic model is trained based on corpora annotated with anaphoric relations. However, a semantic model is trained based on a largescale unannotated corpora to counter the data sparseness problem. We also propose a notion of certainty to improve the accuracy of zero pronoun resolution. We show the effectiveness of our method by way of experiments.
    Download PDF (2307K)
  • KAZUHIRO TAKEUCHI, YUJI MATSUMOTO
    2002 Volume 9 Issue 3 Pages 87-108
    Published: July 10, 2002
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this paper, we investigate operations in summary generation. In order to align a summary expression with the corresponding original expression in source text, we introduce an automated algorithm based on dependency structure of sentences. Our algorithm detects not only one-to-one sentence alignment, but also one-to-many sentence alignment. We apply the algorithm to human made natural summaries, and analyze the results of the alignment. As a result of the analysis, we find most of the summary expressions are kept their dependency structure in original sentences and confirm one of the operation called “sentence combination”, in which more than two source sentences are used to generate a summary sentence, plays an important role in summary generation. Furthermore, we characterize operations and paraphrasing that cover most summary generation.
    Download PDF (3468K)
  • KEIICHI SAKAI
    2002 Volume 9 Issue 3 Pages 109-128
    Published: July 10, 2002
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this paper, we described a natural language dialogue model for information retreival with multiple dialouge agents. In the complex dialogue for information retrieval, it is difficult to realize an effective dialogue with the almighty dialogue agent. Therefore, we propose a dialogue model which mekes users proceed dialogue fluently in the following three situations by changing dialogue agents:
    the domain agents make the user aware of the boundary between the domains.
    the strategy agents make the user aware of the difference between the strategies.
    the context agents help the user to deal with multiple goals. We expect that the complex behaviours of the system will become more easy to recognize to the user in different situations. The experimental results show that the user can retrieve the expected goals effectively and obtain them easily by using these multiple agents.
    Download PDF (3184K)
  • Based on Text Summarization Challenge (TSC), a Subtask of NTCIR Workshop 2
    HIDETSUGU NANBA, MANABU OKUMURA
    2002 Volume 9 Issue 3 Pages 129-146
    Published: July 10, 2002
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Evaluation methods whose targets are system outputs (summaries) themselves are often called “intrinsic methods”. Computer-produced summaries have been traditionally evaluated by comparing with human-written summaries using the F-measure. But, the F-measure has the following problem: the F-measure is not appropriate when alternative sentences are possible in a human-produced extract. For example, when there are two sentences 1 and 2, and sentence 1 is in a human-produced extract, if a system chooses sentence 2, it obtains lower score, even if sentences 1 and 2 are interchangeable. In this paper, we examine some of the evaluation methods devised to overcome the problem. Several methods that devised to overcome the problem have been proposed. Utility-based measure is one of them. However, the method requires a lot of effort for humans to make data for evaluation. In this paper, we first propose pseudo-utility-based measure that uses human-produced extracts at different compression ratios. In order to evaluate the effectiveness of pseudo-utilitybased measure, we compare our measure and the F-measure using the data of Text Summarization Challenge (TSC), a subtask of NTCIR workshop 2, and show that pseudo-utility-based measure can resolve the problem. Next, we focus on contentbased evaluation. Though it is reported that content-based measure is effective to resolve the problem, it has not been examined from a viewpoint of comparison of two extracts that are produced from different systems. We evaluated computer-produced summaries of the TSC by the content-based measure, and compared the results with a subjective evaluation. We found that the evaluation by the content-based measure matched those by humans in 93% of the cases, if the gap in the content-based scores between two abstracts is more than 0.2.
    Download PDF (1947K)
feedback
Top