自然言語処理

巻頭言

Are the Horizons Still Shrinking?

徳永健伸

2016 年 23 巻 3 号 p. 233-234
発行日: 2016/06/15
公開日: 2016/09/15

DOIhttps://doi.org/10.5715/jnlp.23.233

ジャーナルフリー

PDF形式でダウンロード (149K)

論文

Chinese Word Segmentation and Unknown Word Extraction by Mining Maximized Substring

Mo Shen, Daisuke Kawahara, Sadao Kurohashi

2016 年 23 巻 3 号 p. 235-266
発行日: 2016/06/15
公開日: 2016/09/15

DOIhttps://doi.org/10.5715/jnlp.23.235

ジャーナルフリー

抄録を表示する抄録を非表示にする

Chinese word segmentation is an initial and important step in Chinese language processing. Recent advances in machine learning techniques have boosted the performance of Chinese word segmentation systems, yet the identification of out-of-vocabulary words is still a major problem in this field of study. Recent research has attempted to address this problem by exploiting characteristics of frequent substrings in unlabeled data. We propose a simple yet effective approach for extracting a specific type of frequent substrings, called maximized substrings, which provide good estimations of unknown word boundaries. In the task of Chinese word segmentation, we use these substrings which are extracted from large scale unlabeled data to improve the segmentation accuracy. The effectiveness of this approach is demonstrated through experiments using various data sets from different domains. In the task of unknown word extraction, we apply post-processing techniques that effectively reduce the noise in the extracted substrings. We demonstrate the effectiveness and efficiency of our approach by comparing the results with a widely applied Chinese word recognition method in a previous study.

抄録全体を表示

PDF形式でダウンロード (408K)
機能的なリテラルを含む公理体系における仮説推論の効率化

山本風人, 井之上直也, 乾健太郎

2016 年 23 巻 3 号 p. 267-299
発行日: 2016/06/15
公開日: 2016/09/15

DOIhttps://doi.org/10.5715/jnlp.23.267

ジャーナルフリー

抄録を表示する抄録を非表示にする

仮説推論は，与えられた観測に対する最良の説明を見つける推論の枠組みである．仮説推論は 80 年代頃から主に人工知能の分野で長らく研究されてきたが，近年，知識獲得技術の成熟に伴い，大規模知識を用いた仮説推論を実世界の問題へ適用するための土壌が徐々に整いつつある．しかしその一方で，大規模な背景知識を用いる際に生じる仮説推論の計算負荷の増大は，重大な問題である．特に言語の意味表示上の依存関係を表すリテラル（本論文では機能リテラルと呼ぶ）が含まれる場合に生じる探索空間の爆発的増大は，実問題への仮説推論の適用において大きな障害となっている．これに対し本論文では，機能リテラルの性質を利用して探索空間の枝刈りを行うことで，効率的に仮説推論の最適解を導く手法を提案する．具体的には，意味的な整合性を欠いた仮説を解空間から除外することで，推論全体の計算効率を向上させる．また，このような枝刈りが，ある条件が満たされる限り本来の最適解を損なわないことを示す．評価実験では，実在の言語処理の問題に対して，大規模背景知識を用いた仮説推論を適用し，その際の既存手法との計算効率の比較を行った．その結果として，提案手法が既存のシステムと比べ，数十～数百倍ほど効率的に最適解が得られていることが確かめられた．

抄録全体を表示

PDF形式でダウンロード (1353K)
A Generalized Dependency Tree Language Model for SMT

John Richardson, Taku Kudo, Hideto Kazawa, Sadao Kurohashi

2016 年 23 巻 3 号 p. 299-321
発行日: 2016/06/15
公開日: 2016/09/15

DOIhttps://doi.org/10.5715/jnlp.23.299

ジャーナルフリー

抄録を表示する抄録を非表示にする

In this paper we describe a generalized dependency tree language model for machine translation. We consider in detail the question of how to define tree-based n-grams, or ‘t-treelets’, and thoroughly explore the strengths and weaknesses of our approach by evaluating the effect on translation quality for nine major languages. In addition, we show that it is possible to attain a significant improvement in translation quality for even non-structured machine translation by reranking filtered parses of k-best string output.

抄録全体を表示

PDF形式でダウンロード (215K)

J-STAGEへの登録はこちら（無料）