IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Special Section on Data Engineering and Information Management
A Linguistics-Driven Approach to Statistical Parsing for Low-Resourced Languages
Prachya BOONKWANThepchai SUPNITHI
著者情報
ジャーナル フリー

2015 年 E98.D 巻 5 号 p. 1045-1052

詳細
抄録
Developing a practical and accurate statistical parser for low-resourced languages is a hard problem, because it requires large-scale treebanks, which are expensive and labor-intensive to build from scratch. Unsupervised grammar induction theoretically offers a way to overcome this hurdle by learning hidden syntactic structures from raw text automatically. The accuracy of grammar induction is still impractically low because frequent collocations of non-linguistically associable units are commonly found, resulting in dependency attachment errors. We introduce a novel approach to building a statistical parser for low-resourced languages by using language parameters as a guide for grammar induction. The intuition of this paper is: most dependency attachment errors are frequently used word orders which can be captured by a small prescribed set of linguistic constraints, while the rest of the language can be learned statistically by grammar induction. We then show that covering the most frequent grammar rules via our language parameters has a strong impact on the parsing accuracy in 12 languages.
著者関連情報
© 2015 The Institute of Electronics, Information and Communication Engineers
前の記事 次の記事
feedback
Top