Journal of Natural Language Processing

Preface

[title in Japanese]

[in Japanese]

2016Volume 23Issue 3 Pages 233-234
Published: June 15, 2016
Released on J-STAGE: September 15, 2016

DOIhttps://doi.org/10.5715/jnlp.23.233

JOURNAL FREE ACCESS

Download PDF (149K)

Paper

Chinese Word Segmentation and Unknown Word Extraction by Mining Maximized Substring

Mo Shen, Daisuke Kawahara, Sadao Kurohashi

2016Volume 23Issue 3 Pages 235-266
Published: June 15, 2016
Released on J-STAGE: September 15, 2016

DOIhttps://doi.org/10.5715/jnlp.23.235

JOURNAL FREE ACCESS

Show abstractHide abstract

Chinese word segmentation is an initial and important step in Chinese language processing. Recent advances in machine learning techniques have boosted the performance of Chinese word segmentation systems, yet the identification of out-of-vocabulary words is still a major problem in this field of study. Recent research has attempted to address this problem by exploiting characteristics of frequent substrings in unlabeled data. We propose a simple yet effective approach for extracting a specific type of frequent substrings, called maximized substrings, which provide good estimations of unknown word boundaries. In the task of Chinese word segmentation, we use these substrings which are extracted from large scale unlabeled data to improve the segmentation accuracy. The effectiveness of this approach is demonstrated through experiments using various data sets from different domains. In the task of unknown word extraction, we apply post-processing techniques that effectively reduce the noise in the extracted substrings. We demonstrate the effectiveness and efficiency of our approach by comparing the results with a widely applied Chinese word recognition method in a previous study.

View full abstract

Download PDF (408K)
Boosting Abductive Reasoning with Functional Literals

Kazeto Yamamoto, Naoya Inoue, Kentaro Inui

2016Volume 23Issue 3 Pages 267-299
Published: June 15, 2016
Released on J-STAGE: September 15, 2016

DOIhttps://doi.org/10.5715/jnlp.23.267

JOURNAL FREE ACCESS

Show abstractHide abstract

Abduction is also known as Inference to the Best Explanation. It has long been considered as a promising framework for natural language processing (NLP). While recent advances in the techniques of automatic world knowledge acquisition warrant developing large-scale knowledge bases, the computational complexity of abduction hinders its application to real-life problems. In particular, when a knowledge base contains functional literals, which express the dependency relation between words, the size of the search space will substantially increase. In this study, we propose a method to enhance the efficiency of first-order abductive reasoning. By exploiting the property of functional literals, the proposed method prunes inferences that do not lead to reasonable explanations. Furthermore, we prove that the proposed method is sound under a particular condition. In our experiment, we apply abduction having a large-scale knowledge base to a real-life NLP task. We show that our method significantly improves the computational efficiency of first-order abductive reasoning when compared with a state-of-the-art system.

View full abstract

Download PDF (1353K)
A Generalized Dependency Tree Language Model for SMT

John Richardson, Taku Kudo, Hideto Kazawa, Sadao Kurohashi

2016Volume 23Issue 3 Pages 299-321
Published: June 15, 2016
Released on J-STAGE: September 15, 2016

DOIhttps://doi.org/10.5715/jnlp.23.299

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper we describe a generalized dependency tree language model for machine translation. We consider in detail the question of how to define tree-based n-grams, or ‘t-treelets’, and thoroughly explore the strengths and weaknesses of our approach by evaluating the effect on translation quality for nine major languages. In addition, we show that it is possible to attain a significant improvement in translation quality for even non-structured machine translation by reranking filtered parses of k-best string output.

View full abstract

Download PDF (215K)

Register with J-STAGE for free!