Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 9, Issue 4
Displaying 1-6 of 6 articles from this issue
  • [in Japanese]
    2002 Volume 9 Issue 4 Pages 1
    Published: July 10, 2002
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (137K)
  • TATSUNORI MORI
    2002 Volume 9 Issue 4 Pages 3-32
    Published: July 10, 2002
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper proposes a new term weighting method for summarizing documents retrieved by IR systems. Unlike query-biased summarization methods, our method utilizes not the information of query, but the similarity information among original documents by hierarchical clustering. In order to map the similarity structure of the clusters into the weight of each word, we adopt the information gain ratio (IGR) of probabilistic distribution of each word as a term weight. If the amount of information of a word in a cluster increases after the cluster is partitioned into sub-clusters, we may consider that the word contributes to determine the structure of the subclusters. The IGR is a measure to express the degree of such contribution. We show the effectiveness of our method based on the IGR by comparison with other systems in Text Summarization Challenge of NTCIR2.
    Download PDF (3045K)
  • KAI ISHIKAWA, SHINICHI ANDO, AKITOSHI OKUMURA
    2002 Volume 9 Issue 4 Pages 33-53
    Published: July 10, 2002
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We proposed an evaluation method based on multiple correct answer summaries. Conventional evaluation methods had reliability problem due to adopting single model answer while multiple correct answer summaries may exist from various points of view. We aimed to increase the reliability of automatic evaluation, and focused on an evaluation method using multiple answer summaries. In our method, we introduced linear combinations of answer summaries, all denoted by vectors, and calculated its maximum value of the scalar product for the answers and the target summary. To verify the reliability of our method, 7 people created summaries for 4 newspaper articles in NTCIR-2 summarization test collection data. However, low agreement among these answer summaries showed these data inadequate to be used as answers for the evaluation method. These summaries showed some tendency of keeping the text configurations due to anaphoric relations and sentence cohesions. Those findings will be valuable in creating model summaries. To verify the feasibility of the evaluation method, some automatic methods were evaluated using the multiple correct summaries. Most feasible method was varied according to each correct summary. The result has proved our presupposed theory, that multiple correct answers were necessary to sufficiently evaluate the target summary data.
    Download PDF (3733K)
  • MAMIKO HATAYAMA, YOSHIHIRO MATSUO, SATOSHI SHIRAI
    2002 Volume 9 Issue 4 Pages 55-73
    Published: July 10, 2002
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We propose a new method of summarizing newspaper articles that extracts using a case-frame dictionary, important words and phrases from original articles and generates a summary by reconstructing those extracted words and phrases. The number of sentences in the generated summary can be controlled by users from one to a few sentences. We have also developed the prototype summarization system ALTLINE and evaluate the system by comparing generated summaries to human-produced summaries. This evaluation result shows that the ALTLINE was ranked near middle among all the human subjects, proving that the system summaries obtained comparable to human summaries.
    Download PDF (2694K)
  • YOSHIHIRO UEDA, MAMIKO OKA, TAKAHIRO KOYAMA, TADANOBU MIYAUCHI
    2002 Volume 9 Issue 4 Pages 75-96
    Published: July 10, 2002
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We have developed a summarization method that creates a summary suitable for the process of sifting information retrieval results. Conventional methods extract important sentences to produce summaries that tend to be long and complex. We have developed the phrase-representation summarization method that constructs short phrases to reduce the burden of reading such long sentences. Each phrase is constructed by (1) dependency analysis to extract the relations between words, (2) selection of the core relation from dependency relations, (3) adding relations necessary for the unity of the phrase's meaning, and (4) generation of the surface phrase from the constructed graph. To evaluate the effectiveness of this method, we have developed an improved task-based evaluation method of summarization, the accuracy of which is increased by specifying the details of the task including background stories, and by assigning ten subjects per summary sample. The method also serves precision/recall pairs for a variety of situations by introducing multiple levels of relevance assessment. The method is applied to prove that phrase-represented summary is most effective to select relevant documents from information retrieval results. Theresult comes from the fact that the constructed phrases are rather short but cover more important keywords.
    Download PDF (12880K)
  • MANABU OKUMURA, HIDETSUGU NANBA
    2002 Volume 9 Issue 4 Pages 97-116
    Published: July 10, 2002
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this article, we try to survey the current trends in the field of automated text summarization, especially concentrating on the following three topics: researches on producing more natural summaries in single document summarization, farther activation of researches on multi-document summarization, and more variety of summarization inputs in the researches.
    Download PDF (2350K)
feedback
Top