Journal of Natural Language Processing

Preface

[title in Japanese]

[in Japanese]

2015 Volume 22 Issue 3 Pages 137
Published: September 14, 2015
Released on J-STAGE: December 14, 2015

DOIhttps://doi.org/10.5715/jnlp.22.137

JOURNAL FREE ACCESS

Download PDF (103K)

Paper

Parallel Sentence Extraction Based on Unsupervised Bilingual Lexicon Extraction from Comparable Corpora

Chenhui Chu, Toshiaki Nakazawa, Sadao Kurohashi

2015 Volume 22 Issue 3 Pages 139-170
Published: June 16, 2015
Released on J-STAGE: December 14, 2015

DOIhttps://doi.org/10.5715/jnlp.22.139

JOURNAL FREE ACCESS

Show abstractHide abstract

Parallel corpora are crucial for statistical machine translation (SMT); however, they are quite scarce for most language pairs and domains. As comparable corpora are far more available, many studies have been conducted to extract parallel sentences from them for SMT. Parallel sentence extraction relies highly on bilingual lexicons that are also very scarce. We propose an unsupervised bilingual lexicon extraction based parallel sentence extraction system that first extracts bilingual lexicons from comparable corpora and then extracts parallel sentences using the lexicons. Our bilingual lexicon extraction method is based on a combination of topic model and context based methods in an iterative process. The proposed method does not rely on any prior knowledge, and the performance can be improved iteratively. The parallel sentence extraction method uses a binary classifier for parallel sentence identification. The extracted bilingual lexicons are used for the classifier to improve the performance of parallel sentence extraction. Experiments conducted with the Wikipedia data indicate that the proposed bilingual lexicon extraction method greatly outperforms existing methods, and the extracted bilingual lexicons significantly improve the performance of parallel sentence extraction for SMT.

View full abstract

Download PDF (1859K)
Stacking Approach to Temporal Relation Classification with Temporal Inference

Natsuda Laokulrat, Makoto Miwa, Yoshimasa Tsuruoka

2015 Volume 22 Issue 3 Pages 171-196
Published: June 16, 2015
Released on J-STAGE: December 14, 2015

DOIhttps://doi.org/10.5715/jnlp.22.171

JOURNAL FREE ACCESS

Show abstractHide abstract

Traditional machine-learning-based approaches to temporal relation classification use only local features, i.e., those relating to a specific pair of temporal entities (events and temporal expressions), and thus fail to incorporate useful information that could be inferred from nearby entities. In this paper, we use timegraphs and stacked learning to perform temporal inference for classification in the temporal relation classification task. In our model, we predict a temporal relation by considering the consistency of possible relations between nearby entities. Performing 10-fold cross-validation on the Timebank corpus, we achieve an F1 score of 60.25% using a graph-based evaluation, which is 0.90 percentage points higher than that of the local approach, outperforming other proposed systems.

View full abstract

Download PDF (283K)
Summarizing a Document by Trimming a Nested Tree Structure

Yuta Kikuchi, Tsutomu Hirao, Hiroya Takamura, Manabu Okumura, Masaaki ...

2015 Volume 22 Issue 3 Pages 197-217
Published: June 16, 2015
Released on J-STAGE: December 14, 2015

DOIhttps://doi.org/10.5715/jnlp.22.197

JOURNAL FREE ACCESS

Show abstractHide abstract

Many methods of text summarization that have recently been proposed combine sentence selection and sentence compression. Although the dependency between words has been used in most of these methods, the dependency between sentences, i.e., the rhetorical structure, has not been exploited in such joint methods. We use both the dependency between words and the dependency between sentences by constructing a nested tree, in which nodes in a document tree representing the dependency between sentences were replaced by a sentence tree representing the dependency between words. We formulate a summarization task as a combinatorial optimization problem, in which the nested tree is trimmed without losing important content in the source document. The results from an empirical evaluation revealed that our method based on the trimming of the nested tree significantly improved the performance of text summarization.

View full abstract

Download PDF (971K)

Register with J-STAGE for free!