自然言語処理

巻頭言

バイリンガルと同時通訳

永田昌明

2015 年 22 巻 3 号 p. 137
発行日: 2015/09/14
公開日: 2015/12/14

DOIhttps://doi.org/10.5715/jnlp.22.137

ジャーナルフリー

PDF形式でダウンロード (103K)

論文

Parallel Sentence Extraction Based on Unsupervised Bilingual Lexicon Extraction from Comparable Corpora

Chenhui Chu, Toshiaki Nakazawa, Sadao Kurohashi

2015 年 22 巻 3 号 p. 139-170
発行日: 2015/06/16
公開日: 2015/12/14

DOIhttps://doi.org/10.5715/jnlp.22.139

ジャーナルフリー

抄録を表示する抄録を非表示にする

Parallel corpora are crucial for statistical machine translation (SMT); however, they are quite scarce for most language pairs and domains. As comparable corpora are far more available, many studies have been conducted to extract parallel sentences from them for SMT. Parallel sentence extraction relies highly on bilingual lexicons that are also very scarce. We propose an unsupervised bilingual lexicon extraction based parallel sentence extraction system that first extracts bilingual lexicons from comparable corpora and then extracts parallel sentences using the lexicons. Our bilingual lexicon extraction method is based on a combination of topic model and context based methods in an iterative process. The proposed method does not rely on any prior knowledge, and the performance can be improved iteratively. The parallel sentence extraction method uses a binary classifier for parallel sentence identification. The extracted bilingual lexicons are used for the classifier to improve the performance of parallel sentence extraction. Experiments conducted with the Wikipedia data indicate that the proposed bilingual lexicon extraction method greatly outperforms existing methods, and the extracted bilingual lexicons significantly improve the performance of parallel sentence extraction for SMT.

抄録全体を表示

PDF形式でダウンロード (1859K)
Stacking Approach to Temporal Relation Classification with Temporal Inference

Natsuda Laokulrat, Makoto Miwa, Yoshimasa Tsuruoka

2015 年 22 巻 3 号 p. 171-196
発行日: 2015/06/16
公開日: 2015/12/14

DOIhttps://doi.org/10.5715/jnlp.22.171

ジャーナルフリー

抄録を表示する抄録を非表示にする

Traditional machine-learning-based approaches to temporal relation classification use only local features, i.e., those relating to a specific pair of temporal entities (events and temporal expressions), and thus fail to incorporate useful information that could be inferred from nearby entities. In this paper, we use timegraphs and stacked learning to perform temporal inference for classification in the temporal relation classification task. In our model, we predict a temporal relation by considering the consistency of possible relations between nearby entities. Performing 10-fold cross-validation on the Timebank corpus, we achieve an F1 score of 60.25% using a graph-based evaluation, which is 0.90 percentage points higher than that of the local approach, outperforming other proposed systems.

抄録全体を表示

PDF形式でダウンロード (283K)
入れ子依存木の刈り込みによる単一文書要約手法

菊池悠太, 平尾努, 高村大也, 奥村学, 永田昌明

2015 年 22 巻 3 号 p. 197-217
発行日: 2015/06/16
公開日: 2015/12/14

DOIhttps://doi.org/10.5715/jnlp.22.197

ジャーナルフリー

抄録を表示する抄録を非表示にする

近年の抽出型要約の多くの手法は，原文書の情報を網羅し，かつ与えられる要約長の制約に柔軟に対応すべく，文抽出と文圧縮を併用した組み合わせ最適化問題として要約を定式化している．つまり，文書から文という文法的な単位を維持するよう単語を抽出することで要約を生成している．従来の手法は非文の生成を避けるため，構文木における単語間の関係を利用して文を圧縮しているものの，文書における大域的な文と文の間の関係，つまり談話構造には着目してこなかった．しかし，談話構造を考慮することは要約の一貫性を保つ上で非常に重要であり，文書の重要箇所の同定にも役立つ．我々は，文書を文間の依存関係，単語間の依存関係をあらわした入れ子依存木とみなし，単語重要度の和が最大となるように木を刈り込むことで要約を生成する手法を提案する．実験の結果，提案手法が要約精度を有意に向上させたことが確認できた．

抄録全体を表示

PDF形式でダウンロード (971K)

J-STAGEへの登録はこちら（無料）