単語知識を必要としない高精度な言語モデル

森 大毅; 阿曽 弘具; 牧野 正三

doi:10.5715/jnlp.6.2_29

Abstract

This paper proposes a novel, knowledge-free language model with great ability in reducing ambiguity. This model is defined as n-gram of string which is referred to “superword, ” and belongs to a superclass of traditional word or string n-gram models'class. The concept of superword is based on only one principle-repetitionality in training text. The probabilistic distribution of the model is learned through the forward-backward algorithm. Experimental results showed that the performance of superword model combined with character trigram model was superior to the traditional word model based on morphological analysis, and traditional string-based model.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!