Abstract
This paper proposes a novel, knowledge-free language model with great ability in reducing ambiguity. This model is defined as n-gram of string which is referred to “superword, ” and belongs to a superclass of traditional word or string n-gram models'class. The concept of superword is based on only one principle-repetitionality in training text. The probabilistic distribution of the model is learned through the forward-backward algorithm. Experimental results showed that the performance of superword model combined with character trigram model was superior to the traditional word model based on morphological analysis, and traditional string-based model.