Journal of Natural Language Processing

[title in Japanese]

[in Japanese]

2002Volume 9Issue 3 Pages 1-2
Published: July 10, 2002
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.9.3_1

JOURNAL FREE ACCESS

Download PDF (252K)
Estimating reliability of rules in decision lists using Bayesian learning

YOSHIMASA TSURUOKA, TAKASHI CHIKAYAMA

2002Volume 9Issue 3 Pages 3-19
Published: July 10, 2002
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.9.3_3

JOURNAL FREE ACCESS

Show abstractHide abstract

The decision list algorithm is one of the most successful algorithms for classification problems in natural language processing. The most important part of the decision list algorithm is the calculation of reliability for each rule, hence the estimation of probability for each contextual evidence. However, the majority of research efforts using decision lists do not think much of the estimation method. We propose an estimation method based on Bayesian learning which gives well-founded smoothing and better use of prior information on each type of contextual evidences. Experimental results obtained using Senseval-1 data set and Japanese pseudowords show that our method makes probability estimation more precise, leading to improvement of classification performance of the decision list algorithm.

View full abstract

Download PDF (1576K)
The Use of Domain-Specific Statistical Data for Japanese Morphological Analysis

SHIHO NOBESAWA, KENGO SATO, HIROAKI SAITO

2002Volume 9Issue 3 Pages 21-40
Published: July 10, 2002
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.9.3_21

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose two methods for the recognition of unknown strings in dictionary-based natural language processing systems. One method is for the dynamic use of statistical information during processing, and the other is for obtaining meaningful strings which should be added to the dictionary. Both methods are based on statistical information drawn from a training corpus, and there is no need for part-of-speech tagging or other preprocessing of the training corpus. We applied our methods to a Japanese morphological analysis system and had good results in reduction of unknown words and over segmentation.

View full abstract

Download PDF (5321K)
Knowledge Acquisition about the Abbreviation Possibility of Verb Phrases

HIROYUKI SAKAI, NAOTSUGU SHINOHARA, SHIGERU MASUYAMA, KAZUHIDE YAMAMOT ...

2002Volume 9Issue 3 Pages 41-62
Published: July 10, 2002
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.9.3_41

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes a method of acquiring knowledge about the abbreviation possibility of verb phrases. In a certain clause containing a verb and including verb phrases, the proposed method extracts some clauses which contain the same verb and have different case postpositional particles from a large corpus. Then, our method recognizes verb phrases possible to be abbreviated by comparing the verb phrases with the verb phrases contained in the extracted clauses. In our method, the verb phrases containing important piece of information is hard to recognize as being possible to be abbreviated, and the verb phrases containing information which appear in previous sentences is easy to recognize as being possible to be abbreviated. The evaluation of our method by experiments shows that the precision is 78.0% and the recall is 67.9%.We compare our method with the method which recognizes verb phrases possible to be abbreviated by recognizing optional case elements described in a case frame dictionary as being possible to be abbreviated. By the evaluation results, we conclude that our method outperforms the method which recognizes verb phrases possible to be abbreviated by using a case frame dictionary.

View full abstract

Download PDF (2439K)
Japanese Zero Pronoun Resolution using a Probabilistic Model

KAZUHIRO SEKI, ATSUSHI FUJII, TETSUYA ISHIKAWA

2002Volume 9Issue 3 Pages 63-85
Published: July 10, 2002
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.9.3_63

JOURNAL FREE ACCESS

Show abstractHide abstract

In Japanese, entities which can easily be predicted are often omitted. Identifying appropriate antecedents associated with those ellipses, which is termed “anaphora resolution”, is crucial in natural language processing, specifically, a discourse analysis. This paper proposes a probabilistic model to resolve zero pronouns, which are one of the major ellipses in Japanese. Our proposing model can be decomposed into two models associated with syntactic and semantic properties, so as to optimize a parameter estimation. A syntactic model is trained based on corpora annotated with anaphoric relations. However, a semantic model is trained based on a largescale unannotated corpora to counter the data sparseness problem. We also propose a notion of certainty to improve the accuracy of zero pronoun resolution. We show the effectiveness of our method by way of experiments.

View full abstract

Download PDF (2307K)
Sentence Reconstruction in Summary Generation: An Investigation using Automated Alignment

KAZUHIRO TAKEUCHI, YUJI MATSUMOTO

2002Volume 9Issue 3 Pages 87-108
Published: July 10, 2002
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.9.3_87

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we investigate operations in summary generation. In order to align a summary expression with the corresponding original expression in source text, we introduce an automated algorithm based on dependency structure of sentences. Our algorithm detects not only one-to-one sentence alignment, but also one-to-many sentence alignment. We apply the algorithm to human made natural summaries, and analyze the results of the alignment. As a result of the analysis, we find most of the summary expressions are kept their dependency structure in original sentences and confirm one of the operation called “sentence combination”, in which more than two source sentences are used to generate a summary sentence, plays an important role in summary generation. Furthermore, we characterize operations and paraphrasing that cover most summary generation.

View full abstract

Download PDF (3468K)
A Dialogue Model for Information Retrieval with Multiple Dialogue Agents

KEIICHI SAKAI

2002Volume 9Issue 3 Pages 109-128
Published: July 10, 2002
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.9.3_109

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we described a natural language dialogue model for information retreival with multiple dialouge agents. In the complex dialogue for information retrieval, it is difficult to realize an effective dialogue with the almighty dialogue agent. Therefore, we propose a dialogue model which mekes users proceed dialogue fluently in the following three situations by changing dialogue agents:
the domain agents make the user aware of the boundary between the domains.
the strategy agents make the user aware of the difference between the strategies.
the context agents help the user to deal with multiple goals. We expect that the complex behaviours of the system will become more easy to recognize to the user in different situations. The experimental results show that the user can retrieve the expected goals effectively and obtain them easily by using these multiple agents.

View full abstract

Download PDF (3184K)
Some Examinations of Intrinsic Methods for Summary Evaluation

Based on Text Summarization Challenge (TSC), a Subtask of NTCIR Workshop 2

HIDETSUGU NANBA, MANABU OKUMURA

2002Volume 9Issue 3 Pages 129-146
Published: July 10, 2002
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.9.3_129

JOURNAL FREE ACCESS

Show abstractHide abstract

Evaluation methods whose targets are system outputs (summaries) themselves are often called “intrinsic methods”. Computer-produced summaries have been traditionally evaluated by comparing with human-written summaries using the F-measure. But, the F-measure has the following problem: the F-measure is not appropriate when alternative sentences are possible in a human-produced extract. For example, when there are two sentences 1 and 2, and sentence 1 is in a human-produced extract, if a system chooses sentence 2, it obtains lower score, even if sentences 1 and 2 are interchangeable. In this paper, we examine some of the evaluation methods devised to overcome the problem. Several methods that devised to overcome the problem have been proposed. Utility-based measure is one of them. However, the method requires a lot of effort for humans to make data for evaluation. In this paper, we first propose pseudo-utility-based measure that uses human-produced extracts at different compression ratios. In order to evaluate the effectiveness of pseudo-utilitybased measure, we compare our measure and the F-measure using the data of Text Summarization Challenge (TSC), a subtask of NTCIR workshop 2, and show that pseudo-utility-based measure can resolve the problem. Next, we focus on contentbased evaluation. Though it is reported that content-based measure is effective to resolve the problem, it has not been examined from a viewpoint of comparison of two extracts that are produced from different systems. We evaluated computer-produced summaries of the TSC by the content-based measure, and compared the results with a subjective evaluation. We found that the evaluation by the content-based measure matched those by humans in 93% of the cases, if the gap in the content-based scores between two abstracts is more than 0.2.

View full abstract

Download PDF (1947K)

Register with J-STAGE for free!