Information and Media Technologies
Online ISSN : 1881-0896
ISSN-L : 1881-0896
Computer Networks and Broadcasting
Generalized Hierarchical Word Sequence Framework for Language Modeling
Xiaoyi WuKevin DuhYuji Matsumoto
ジャーナル フリー

2017 年 12 巻 p. 291-315


Language modeling is a fundamental research problem that has wide application for many NLP tasks. For estimating probabilities of natural language sentences, most research on language modeling use n-gram based approaches to factor sentence probabilities. However, the assumption under n-gram models is not robust enough to cope with the data sparseness problem, which affects the final performance of language models. In this paper, we propose a generalized hierarchical word sequence framework, where different word association scores can be adopted to rearrange word sequences in a totally unsupervised fashion. Unlike the n-gram which factors sentence probability from left-to-right, our model factors using a more flexible strategy. For evaluation, we compare our rearranged word sequences to normal n-gram word sequences. Both intrinsic and extrinsic experiments verify that our language model can achieve better performance, proving that our method can be considered as a better alternative for n-gram language models.

© 2017 The Association for Natural Language Processing
前の記事 次の記事