Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Efficient EM learning of probabilistic CFGs and their extensions by using WFSTs
YOSHITAKA KAMEYATAKASHI MORITAISUKE SATO
Author information
JOURNAL FREE ACCESS

2001 Volume 8 Issue 1 Pages 49-84

Details
Abstract

Probabilistic context-free grammars (PCFGs) are a widely-known class of statistical language models. The Inside-Outside (I-O) algorithm is also well-known as an efficient EM algorithm tailored for PCFGs. Although the algorithm requires only inexpensive linguistic resources, there remains a problem in its efficiency. In this paper, we present a new framework for efficient EM learning of PCFGs in which the parser is separated from the EM algorithm, assuming the underlying CFG is given. A new EM procedure exploits the compactness of WFSTs (well-formed substring tables) generated by the parser. Our framework is quite general in the sense that the input grammar need not to be in Chomsky normal form (CNF) while the new EM algorithm is equivalent to the I-O algorithm in the CNF case. In addition, we propose a polynomial-time EM procedure for CFGs with context-sensitive probabilities, and report experimental results with ATR corpus and a hand-crafted Japanese grammar.

Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top