Journal of Natural Language Processing

Preface

[title in Japanese]

[in Japanese]

2013Volume 20Issue 5 Pages 627
Published: December 13, 2013
Released on J-STAGE: March 13, 2014

DOIhttps://doi.org/10.5715/jnlp.20.627

JOURNAL FREE ACCESS

Download PDF (111K)

Paper

ILP-based Inference for Cost-based Abduction on First-order Predicate Logic

Naoya Inoue, Kentaro Inui

2013Volume 20Issue 5 Pages 629-656
Published: December 13, 2013
Released on J-STAGE: March 13, 2014

DOIhttps://doi.org/10.5715/jnlp.20.629

JOURNAL FREE ACCESS

Show abstractHide abstract

Abduction is desirable for many natural language processing (NLP) tasks. While recent advances in large-scale knowledge acquisition warrant applying abduction with large knowledge bases to real-life NLP problems, as of yet, no existing approach to abduction has achieved the efficiency necessary to be a practical solution for large-scale reasoning on real-life problems. In this paper, we propose an efficient solution for large-scale abduction. The contributions of our study are as follows: (i) we propose an efficient method of cost-based abduction in first-order predicate logic that avoids computationally expensive grounding procedures; (ii) we formulate the best-explanation search problem as an integer linear programming optimization problem, making our approach extensible; (iii) we show how cutting plane inference, which is an iterative optimization strategy developed in operations research, can be applied to make abduction in first-order logic tractable; and (iv) the abductive inference engine presented in this paper is made publicly available.

View full abstract

Download PDF (636K)
Temporal Ordering Annotation on ‘the Balanced Corpus of Contemporary Written Japanese’

Sachi Yasuda, Hikari Konishi, Masayuki Asahara, Mizuho Imada, Kikuo Ma ...

2013Volume 20Issue 5 Pages 657-681
Published: December 13, 2013
Released on J-STAGE: March 13, 2014

DOIhttps://doi.org/10.5715/jnlp.20.657

JOURNAL FREE ACCESS

Show abstractHide abstract

Temporal information extraction can be divided into the following tasks: temporal expression extraction, time normalization and temporal ordering relation resolution. The first task is a subtask of a named entity and numeral expression extraction. The second task is often performed by rewriting systems. The third task consists of event anchoring. This paper proposed a Japanese temporal ordering annotation scheme and performed annotations by referring to ‘the Balanced Corpus of Contemporary Written Japanese’ (BCCWJ). We extracted verbal and adjective event expressions as 〈EVENT〉 in a subset of BCCWJ and annotated a temporal ordering relation 〈TLINK〉 on the pairs of the above event expressions and time expressions obtained from a previous study (Konishi et al. 2013). The recognition of temporal ordering by language recipients tends to disagree compared to the normalization of time expressions. We should not regard making unique gold annotation data as an objective in such a situation. If anything, we should evaluate the degree of inter-annotator discrepancy by subjects of experiments. Then, we analysed inter-annotator discrepancies by three annotators in temporal ordering annotation. The result showed that boundaries of time segments barely exhibit any agreement, whereas the annotation of temporal relative ordering tendency exhibits good agreement by the annotators.

View full abstract

Download PDF (590K)
Complaint Sentence Detection via Automatic Training Data Generation using Sentiment Lexicons and Context Coherence

Takashi Inui, Yusuke Umesawa, Mikio Yamamoto

2013Volume 20Issue 5 Pages 683-705
Published: December 13, 2013
Released on J-STAGE: March 13, 2014

DOIhttps://doi.org/10.5715/jnlp.20.683

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose an automatic method for detecting complaint sentences from review documents. The proposed method consists of two procedures. One is a data generation procedure using sentiment lexicons and context coherence and the other is the expansion of a naive Bayes classifier based on the characteristics of the training data. This method has an advantage of not requiring human effort for the creation of large-scale training data and management of rules for complaint detection. The experimental results indicate that this method is more effective than the baseline methods.

View full abstract

Download PDF (697K)
Domain Adaptation for Word Sense Disambiguation using k-Nearest Neighbor Algorithm and Topic Model

Hiroyuki Shinnou, Minoru Sasaki

2013Volume 20Issue 5 Pages 707-726
Published: December 13, 2013
Released on J-STAGE: March 13, 2014

DOIhttps://doi.org/10.5715/jnlp.20.707

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose the method of domain adaptation for word sense disambiguation (WSD). This method faces the following problems for WSD. (1) The difference between sense distributions on domains. (2) The sparseness of data caused by changing the domain. In this paper, we discuss and recommend the countermeasure for each problem. We use the k-nearest neighbor algorithm (k-NN) and the topic model for the first and second problems, respectively. In particular, we append topic features developed by the topic model for target domain corpus to to training data in source domain and test data in target domain. Using the extended features of support vector machine (SVM) classifier, we solve WSD. However, when the reliability of decision of the SVM classifier for a test instance is low, we use the decision of the k-NN. In the experiment, we select 17 ambiguous words in both domains, PB (books) and OC (Yahoo! Chie Bukuro) in the balanced corpus of contemporary written Japanese (BCCWJ corpus), which appear 50 times or more in these domains, and conduct the experiment of domain adaptation for WSD using these words to show the effectiveness of our method. In the future, we will apply the proposed method to other domains and examine a way to use the topic model considering the universality of a corpus, and an effective ensemble learning for domain adaptation for WSD.

View full abstract

Download PDF (580K)
Morphological Analysis of Historical Japanese Text

Toshinobu Ogiso, Mamoru Komachi, Yuji Matsumoto

2013Volume 20Issue 5 Pages 727-748
Published: December 13, 2013
Released on J-STAGE: March 13, 2014

DOIhttps://doi.org/10.5715/jnlp.20.727

JOURNAL FREE ACCESS

Show abstractHide abstract

To construct a richly annotated diachronic corpus of Japanese, the morphological analysis of historical Japanese text is required. However, conventional analysis of old Japanese texts with adequate accuracy is impossible. To facilitate such analyses, we extended dictionary entries from UniDic for Contemporary Japanese and prepared training corpora including articles illustrating the literary style of the Meiji Era and literature of the Heian Era, thus creating new dictionaries: “UniDic-MLJ (Modern Literary Japanese)” and “UniDic-EMJ (Early Middle Japanese).” These dictionaries achieve a high accuracy (96–97%) as that required for constructing a diachronic corpus of Japanese. Moreover, we investigated the optimal size of the training corpus for the morphological analysis of historical Japanese text on the basis of the learning curves obtained by using these dictionaries. We confirmed that a 50,000-word corpus achieves an adequate accuracy of over 95%, and even a small-sized corpus (only 5,000 words) is effective as long as the corpus is particularly constructed for the target domain.

View full abstract

Download PDF (516K)

Register with J-STAGE for free!