Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 6, Issue 6
Displaying 1-7 of 7 articles from this issue
  • MANABU OKUMURA, HIDETSUGU NANBA
    1999 Volume 6 Issue 6 Pages 1-26
    Published: July 10, 1999
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this article, we try to survey the state of the art of (mainly domain-independent) automated text summarization techniques. We also try to provide readers with the limitations of current technology, and some new trends in the research field.
    Download PDF (3013K)
  • AKITOSHI OKUMURA, TAKAHIRO IKEDA, KAZUNORI MURAKI
    1999 Volume 6 Issue 6 Pages 27-44
    Published: July 10, 1999
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In an office, it is necessary for understanding the temporal transition and the overall situation on an event from various information to extract and abstract a large number of documents. This paper proposes two robust methods for generating an extract and an abstract from documents: an episodic extraction method which generates an extract on the temporal transition of an event and an overall abstraction method which generates an abstract of overall documents for survey. The episodic extraction method retrieves documents including the 5W1H (who, when, where, what, why, how and predicates) information which specifies an event and generates an extract on the temporal transition of the event. The overall abstraction method abstracts documents by replacing 5W1H elements in each document with their upper categories in a thesaurus. These methods proved to be effective for office work from an application to 10000 news articles and 2500 sales reports.
    Download PDF (18919K)
  • KIYONORI OHTAKE, TAKAHIRO FUNASAKA, SHIGERU MASUYAMA, KAZUHIDE YAMAMOT ...
    1999 Volume 6 Issue 6 Pages 45-64
    Published: July 10, 1999
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this paper, we attempt to summarize multiple Japanese articles into one document. Our summarization method deletes verbose parts and overlapped parts in the input texts. We defined an introduction part to detect overlapped part, and if nouns and verbs in an introduction part were included in some other article, then the introduction part is considered to be an overlapped part. This paper focuses on the following five points to sum up: guess sentences, noun modifiers, expressions using parenthesis, detailed expressions of address and the introduction part. We have implemented a prototype system of summarization and experimented on the system using 27 groups of articles. As a result of the experiments 82.1% compression ratio on the average was achieved. In addition, we evaluated this method using 6 groups of articles by obtaining information by means of questionnaires to 11 examinees. The result of evaluation showed us that the summarizations were almost always natural and deleted parts were also appropriate.
    Download PDF (5266K)
  • MAKOTO MIKAMI, SHIGERU MASUYAMA, SEIICHI NAKAGAWA
    1999 Volume 6 Issue 6 Pages 65-81
    Published: July 10, 1999
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We propose and evaluate a method for summarizing each sentence in TV news texts written in Japanese. It is not appropriate to select important sentences for abstracting news texts, because a news text consists of only a few and long sentences. Then, we try to reduce redundant parts, which consist of modifier etc., of each sentence. We use a simple parsing method specialized for news texts so that the syntactical structure is not destroyed. As audiences cannot read repeatedly, a summary must be shortened moderately. It must also be easy to read, containing important information, and reduced its redundancy. Therefore, we evaluate this summarizing method by obtaining information by means of questionnaires to 32 examinees.
    Download PDF (1849K)
  • YOSHIO NAKAO
    1999 Volume 6 Issue 6 Pages 83-112
    Published: July 10, 1999
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper presents an algorithm for detecting the thematic hierarchy of a text with lexical cohesion measured by term repetitions. It detects topic boundaries separating thematic textual units of different sizes, from those just smaller than the entire text to those of about a paragraph in size. It produces a thematic hierarchy by correlating topic boundaries of larger and smaller these textual units. It is intended to be used to summarize a long text, especially a collective one that is aggregated of several parts concerning different topics, such as long reports or serialized columns in a newspaper. It is required for the summarization of such a collective document that topics of appropriate grading be extracted according to the size of the summary to be output. The algorithm can extract a thematic textual unit of arbitrary size so that a well-balanced summary can be generated that includes topics of appropriate grading by summarizing every thematic textual unit of appropriate size. This paper describes the algorithm in detail, and shows its features and accuracy based on experiments using test data consisting of a long technical survey report, eight series of newspaper columns, and twelve economic reports.
    Download PDF (3339K)
  • MASAKI HIRUMA, TAKUMI YAMASHITA, MASAO NARA, NAOYOSHI TAMURA
    1999 Volume 6 Issue 6 Pages 113-129
    Published: July 10, 1999
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper presents a method of automatic extract generation from text by using the rhetorical structure of the text which is built by discriminants trained by the multiple regression analysis. We also show an extension to the summary generation. The way of our method of text structuring is to divide or to combine text segments one after another according to the discriminant with various parameters for superficial features of sentences. Another discriminant to extract important sentences has parameters selected from the point of view of text extraction along with several features on the text structure (rhetorical structure). In the experiment, five examinees select important sentences of newspaper editorials from total 350 articles. The results are used to the calculation of weights of parameters by the multiple regression analysis, and also to the evaluation of accuracy of our method. We also show an extension to summary generation in which sentences or phrases are restored not to decrease the coherency of text caused by deletion of referred sentences, and in which redundant expressions independent to the context are deleted.
    Download PDF (1513K)
  • TAKAHIRO FUKUSHIMA, TERUMASA EHARA, KATSUHIKO SHIRAI
    1999 Volume 6 Issue 6 Pages 131-147
    Published: July 10, 1999
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    It is known that there are fewer sentences and the sentences are longer in a TV news text compared with those in a newspaper article. If we would like to summarize such TV news texts by selecting important sentences, since each sentence is rather long, we end up with losing a good amount of information by omitting a whole sentence. Therefore, we adopt a method in which we partition a long sentence into shorter sentences before summarization. To evaluate how the partitioning affects text summarization, we select two basic measures for text summarization, and examine how they vary before and after the partitioning of long sentences. The two measures are first ranking of sentences in the text by their importance, and second, the number of characters removed from the text by applying the same set of rules for shortening and deleting the text. All the sentences in the text are ranked by their importance by hand and by sentence extraction system. First, we examine how the ranks of the sentences judged important by human vary before and after the partitioning. We found that there are more partitioned important sentences whose difference in ranking is greater or equal to three (3) than those whose difference in ranking is one (1). This suggests that the partitioning is good for sentence extraction. Then, we compare the rankings of the human and the system for all the sentences in the text using Spearman's rank correlation coefficient. We found that the coefficient increases between 0.0318 and 0.065, which means the rankings of the human and the system for the partitioned texts are more similar that those for the original texts. Lastly, we investigate how the partitioning affects the shortening the text. Here we found that the number of characters that are deleted increases for the partitioned texts and a compaction ratio (the number of characters of the shortened text divided by the number of characters in the original text) decreases by 2.71 percent to 2.78 percent. It shows that the partitioning long sentences makes shortening method work better.
    Download PDF (4135K)
feedback
Top