Abstract
In this paper, we describe a method that constructs language models using a taskadaptation strategy and idiomatic expressions of news articles. To build an effective N-gram based language model, it should be noted that the training data must be prepared as much as possible. However, for a given task/topic, it is very difficult to gather much data. First, we investigated the effect of a task adaptation method of N-gram language model using a limited amount of target articles. Second, we investigated the effect of the language model adaptation method using the latest articles. Third, we investigated the effect of the use of idiomatic expressions as morpheme units, since some specific expressions and idiomatic expressions are frequently observed in news articles. We show our proposed three methods are effective for constructing N-gram language models.