Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Grammar Acquisition and Statistical Parsing by Exploiting Local Contextual Information
Thanaruk TheeramunkongManabu Okumura
Author information
JOURNAL FREE ACCESS

1998 Volume 5 Issue 3 Pages 107-123

Details
Abstract
This paper presents a method for inducing a context-sensitive conditional probability context-free grammar from an unlabeled bracketed corpus using local contextual information and describes a natural language parsing model which uses a probabilitybased scoring function of the grammar to rank parses of a sentence. This method uses clustering techniques to group brackets in a corpus into a number of similar bracket groups based on their local contextual information. From the set of these groups, the corpus is automatically labeled with some nonterminal labels, and consequently a grammar with conditional probabilities is acquired. Based on these conditional probabilities, the statistical parsing model provides a framework for finding the most likely parse of a sentence. A number of experiments are made using EDR corpus and Wall Street Journal corpus. The results show that our approach achieves a relatively high accuracy: 88% recall, 72% precision and 0.7 crossing brackets per sentence for sentences shorter than 10 words, and 71% recall, 51% precision and 3.4 crossing brackets for sentences between 10-19 words. This result supports the assumption that local contextual statistics obtained from an unlabeled bracketed corpus are effective for learning a useful grammar and parsing.
Content from these authors
© The Association for Natural Language Processing
Previous article
feedback
Top