Grammar Acquisition and Statistical Parsing by Exploiting Local Contextual Information

Thanaruk Theeramunkong; Manabu Okumura

doi:10.5715/jnlp.5.3_107

Abstract

This paper presents a method for inducing a context-sensitive conditional probability context-free grammar from an unlabeled bracketed corpus using local contextual information and describes a natural language parsing model which uses a probabilitybased scoring function of the grammar to rank parses of a sentence. This method uses clustering techniques to group brackets in a corpus into a number of similar bracket groups based on their local contextual information. From the set of these groups, the corpus is automatically labeled with some nonterminal labels, and consequently a grammar with conditional probabilities is acquired. Based on these conditional probabilities, the statistical parsing model provides a framework for finding the most likely parse of a sentence. A number of experiments are made using EDR corpus and Wall Street Journal corpus. The results show that our approach achieves a relatively high accuracy: 88% recall, 72% precision and 0.7 crossing brackets per sentence for sentences shorter than 10 words, and 71% recall, 51% precision and 3.4 crossing brackets for sentences between 10-19 words. This result supports the assumption that local contextual statistics obtained from an unlabeled bracketed corpus are effective for learning a useful grammar and parsing.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!