Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 12, Issue 6
Displaying 1-7 of 7 articles from this issue
  • [in Japanese]
    2005 Volume 12 Issue 6 Pages 1
    Published: November 10, 2005
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (122K)
  • SATOSHI KOBAYASHI, MASARU YAMAGUCHI, SEIICHI NAKAGAWA
    2005 Volume 12 Issue 6 Pages 3-24
    Published: November 10, 2005
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Although it is easy to record speech, it is not easy to refer to audio recordings.If it is able to index or summarize audio recordings, referring to them would become easier.In this paper, we aim at extracting automatically the summarization of spoken lectures.For this purpose, at first we compared results of summarization extracted by human subjects.We found large differences with every subject.Then we investigated relations between linguistic surface information and human's results, and we obtained useful surface linguistic information.Next, we summarized spoken lectures based on this information, and compared them with human's results.Additionally, we focused on prosodic features; F0 and power.We conducted the same experiments on them.Lastly, we combined linguistic surface information and prosodic information.As a result, we obtained a better F-measure (0.599) and κ-value (0.420), comparable with human's results.
    Download PDF (2167K)
  • YASUHIKO WATANABE, KAZUYA YOKOMIZO, RYO NISHIMURA, YOSHIHIRO OKADA
    2005 Volume 12 Issue 6 Pages 25-44
    Published: November 10, 2005
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    The most serious difficulty in developing a QA system is knowledge.In this paper, we first discuss three problems of developing a knowledge base by which a QA system answers how type questions.Then, we propose a method of developing a knowledge base by using mails posted to a mailing list.Next, we describe a QA system which can answer how type questions based on the knowledge base.Our system finds question mails which are similar to user's question and shows the answers to the user. The similarity between user's question and a question mail is calculated by matching of user's question and a significant sentence in the question mail.Finally, we show that mails posted to a mailing list can be used as a knowledge base by which a QA system answers how type questions.
    Download PDF (4154K)
  • NATSUKI ICHIMARU, TORU HITAKA
    2005 Volume 12 Issue 6 Pages 45-61
    Published: November 10, 2005
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    There have been many researches on important sentence extraction.Now, readability of summary is drawing attention of researchers, who wish to realize informative summary creation.To keep coherence of summary, we focus on associative relations between subjects.In this paper, we propose a method to produce easy-to-read summary by maximizing the sum of subject-flows in it.At first, our system divides sentences into paragraphs at segmentation points where the flow is week, and constructs a multi-layer paragraph tree structure.Then, by analyzing subject-flow, the system finds introductory paragraphs and conclusive paragraphs.Finally, the system decides dispensable paragraphs using a threshold and an estimate of their contribution to surrounding subject-flows.The system automatically adjusts the threshold to minimize error of the compression rate.As a consequence of this, we can get a readable summary which has a strong associative coherence.As an experimental result using newspaper editorial articles, we confirmed that our system produces more readable summaries than a baseline method, and that 77.5% of summaries of which compression rate is 30% keep the conclusion of the original article.
    Download PDF (1647K)
  • Marcin Skowron, Kenji Araki
    2005 Volume 12 Issue 6 Pages 63-83
    Published: November 10, 2005
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Question classification is of crucial importance for question answering.In question classification, the accuracy of ML algorithms was found to significantly outperform other approaches.The two key issues in classification with a ML-based approach are classifier design and feature selection.Support Vector Machines is known to work well for sparse, high dimensional problems.However, the frequently used Bag-of-Words approach does not take full advantage of information contained in a question.To exploit this information we introduce three new feature types: Subordinate Word Category, Question Focus and Syntactic-Semantic Structure.As the results demonstrate, the inclusion of the new features provides higher accuracy of question classification compared to the standard Bag-of-Words approach and other ML based methods such as SVM with the Tree Kernel, SVM with Error Correcting Codes and SNoW.A classification accuracy of 85.6 % obtained using the three introduced feature types is, as of yet the highest reported in the literature, bringing error reduction of 27% compared to the Bag-of-Words approach.
    Download PDF (2216K)
  • KAZUHIDE YAMAMOTO, SATOSHI IKEDA, KAZUTERU OHASHI
    2005 Volume 12 Issue 6 Pages 85-111
    Published: November 10, 2005
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Electrical bulletin board news can be seen in the Shinkansen trains or streets.Their news are short, simple, and concentrated.We present in this paper some expressions often appeared in the bulletin board.For example, at the end of the news sentence we can often see a noun or a case particle, which are not a usual sentence in the newspaper.We first show that this observation is true, investigating over twenty thousands articles of the real bulletin board news: a verbal noun appears eight times larger than an ordinal news paper, and a case particle twenty times larger.We then propose and implement a method of shortning sentence ends into these described above, and evaluate the method.Our evaluation results shows that the summarization rate of the sentence ends are appoximately 12%, and 2.50 characters are deleted on average for one sentence.This power of deletions is approximately as same as that human does.Moreover, we have verified by human judge that the correctness of output expressions is 95%.
    Download PDF (2317K)
  • NOBUAKI HIROSHIMA, TAKAAKI HASEGAWA, MASAHIRO OKU
    2005 Volume 12 Issue 6 Pages 113-128
    Published: November 10, 2005
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We propose a statistical method of generating headlines that show an outline of a Web page.The requirements for creating headlines are completeness, readability, and high compressibility.Our method constructs a keyword selection model using several word features using a Support Vector Machine and a sentence generation model, based on both word N-gram probability and the style of the original sentences.To achieve high compressibility, we create headlines by choosing words from an original text using two models.Our experimental results show that our keyword selection model results in a more complete search and our sentence generation model results in higher readability, compared with conventional methods.
    Download PDF (1573K)
feedback
Top