自然言語処理
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Grammar Acquisition and Statistical Parsing by Exploiting Local Contextual Information
Thanaruk TheeramunkongManabu Okumura
著者情報
ジャーナル フリー

1998 年 5 巻 3 号 p. 107-123

詳細
抄録
This paper presents a method for inducing a context-sensitive conditional probability context-free grammar from an unlabeled bracketed corpus using local contextual information and describes a natural language parsing model which uses a probabilitybased scoring function of the grammar to rank parses of a sentence. This method uses clustering techniques to group brackets in a corpus into a number of similar bracket groups based on their local contextual information. From the set of these groups, the corpus is automatically labeled with some nonterminal labels, and consequently a grammar with conditional probabilities is acquired. Based on these conditional probabilities, the statistical parsing model provides a framework for finding the most likely parse of a sentence. A number of experiments are made using EDR corpus and Wall Street Journal corpus. The results show that our approach achieves a relatively high accuracy: 88% recall, 72% precision and 0.7 crossing brackets per sentence for sentences shorter than 10 words, and 71% recall, 51% precision and 3.4 crossing brackets for sentences between 10-19 words. This result supports the assumption that local contextual statistics obtained from an unlabeled bracketed corpus are effective for learning a useful grammar and parsing.
著者関連情報
© The Association for Natural Language Processing
前の記事
feedback
Top