Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Paper
Generalized Hierarchical Word Sequence Framework for Language Modeling
Xiaoyi WuKevin DuhYuji Matsumoto
Author information
JOURNAL FREE ACCESS

2017 Volume 24 Issue 3 Pages 395-419

Details
Abstract

Language modeling is a fundamental research problem that has wide application for many NLP tasks. For estimating probabilities of natural language sentences, most research on language modeling use n-gram based approaches to factor sentence probabilities. However, the assumption under n-gram models is not robust enough to cope with the data sparseness problem, which affects the final performance of language models. In this paper, we propose a generalized hierarchical word sequence framework, where different word association scores can be adopted to rearrange word sequences in a totally unsupervised fashion. Unlike the n-gram which factors sentence probability from left-to-right, our model factors using a more flexible strategy. For evaluation, we compare our rearranged word sequences to normal n-gram word sequences. Both intrinsic and extrinsic experiments verify that our language model can achieve better performance, proving that our method can be considered as a better alternative for n-gram language models.

Content from these authors
© 2017 The Association for Natural Language Processing
Previous article Next article
feedback
Top