自然言語処理
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
22 巻, 3 号
選択された号の論文の4件中1~4を表示しています
巻頭言
論文
  • Chenhui Chu, Toshiaki Nakazawa, Sadao Kurohashi
    2015 年 22 巻 3 号 p. 139-170
    発行日: 2015/06/16
    公開日: 2015/12/14
    ジャーナル フリー
    Parallel corpora are crucial for statistical machine translation (SMT); however, they are quite scarce for most language pairs and domains. As comparable corpora are far more available, many studies have been conducted to extract parallel sentences from them for SMT. Parallel sentence extraction relies highly on bilingual lexicons that are also very scarce. We propose an unsupervised bilingual lexicon extraction based parallel sentence extraction system that first extracts bilingual lexicons from comparable corpora and then extracts parallel sentences using the lexicons. Our bilingual lexicon extraction method is based on a combination of topic model and context based methods in an iterative process. The proposed method does not rely on any prior knowledge, and the performance can be improved iteratively. The parallel sentence extraction method uses a binary classifier for parallel sentence identification. The extracted bilingual lexicons are used for the classifier to improve the performance of parallel sentence extraction. Experiments conducted with the Wikipedia data indicate that the proposed bilingual lexicon extraction method greatly outperforms existing methods, and the extracted bilingual lexicons significantly improve the performance of parallel sentence extraction for SMT.
  • Natsuda Laokulrat, Makoto Miwa, Yoshimasa Tsuruoka
    2015 年 22 巻 3 号 p. 171-196
    発行日: 2015/06/16
    公開日: 2015/12/14
    ジャーナル フリー
    Traditional machine-learning-based approaches to temporal relation classification use only local features, i.e., those relating to a specific pair of temporal entities (events and temporal expressions), and thus fail to incorporate useful information that could be inferred from nearby entities. In this paper, we use timegraphs and stacked learning to perform temporal inference for classification in the temporal relation classification task. In our model, we predict a temporal relation by considering the consistency of possible relations between nearby entities. Performing 10-fold cross-validation on the Timebank corpus, we achieve an F1 score of 60.25% using a graph-based evaluation, which is 0.90 percentage points higher than that of the local approach, outperforming other proposed systems.
  • 菊池 悠太, 平尾 努, 高村 大也, 奥村 学, 永田 昌明
    2015 年 22 巻 3 号 p. 197-217
    発行日: 2015/06/16
    公開日: 2015/12/14
    ジャーナル フリー
    近年の抽出型要約の多くの手法は,原文書の情報を網羅し,かつ与えられる要約長の制約に柔軟に対応すべく,文抽出と文圧縮を併用した組み合わせ最適化問題として要約を定式化している.つまり,文書から文という文法的な単位を維持するよう単語を抽出することで要約を生成している.従来の手法は非文の生成を避けるため,構文木における単語間の関係を利用して文を圧縮しているものの,文書における大域的な文と文の間の関係,つまり談話構造には着目してこなかった.しかし,談話構造を考慮することは要約の一貫性を保つ上で非常に重要であり,文書の重要箇所の同定にも役立つ.我々は,文書を文間の依存関係,単語間の依存関係をあらわした入れ子依存木とみなし,単語重要度の和が最大となるように木を刈り込むことで要約を生成する手法を提案する.実験の結果,提案手法が要約精度を有意に向上させたことが確認できた.
feedback
Top