Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 20, Issue 5
Displaying 1-6 of 6 articles from this issue
Preface
Paper
  • Naoya Inoue, Kentaro Inui
    2013 Volume 20 Issue 5 Pages 629-656
    Published: December 13, 2013
    Released on J-STAGE: March 13, 2014
    JOURNAL FREE ACCESS
    Abduction is desirable for many natural language processing (NLP) tasks. While recent advances in large-scale knowledge acquisition warrant applying abduction with large knowledge bases to real-life NLP problems, as of yet, no existing approach to abduction has achieved the efficiency necessary to be a practical solution for large-scale reasoning on real-life problems. In this paper, we propose an efficient solution for large-scale abduction. The contributions of our study are as follows: (i) we propose an efficient method of cost-based abduction in first-order predicate logic that avoids computationally expensive grounding procedures; (ii) we formulate the best-explanation search problem as an integer linear programming optimization problem, making our approach extensible; (iii) we show how cutting plane inference, which is an iterative optimization strategy developed in operations research, can be applied to make abduction in first-order logic tractable; and (iv) the abductive inference engine presented in this paper is made publicly available.
    Download PDF (636K)
  • Sachi Yasuda, Hikari Konishi, Masayuki Asahara, Mizuho Imada, Kikuo Ma ...
    2013 Volume 20 Issue 5 Pages 657-681
    Published: December 13, 2013
    Released on J-STAGE: March 13, 2014
    JOURNAL FREE ACCESS
    Temporal information extraction can be divided into the following tasks: temporal expression extraction, time normalization and temporal ordering relation resolution. The first task is a subtask of a named entity and numeral expression extraction. The second task is often performed by rewriting systems. The third task consists of event anchoring. This paper proposed a Japanese temporal ordering annotation scheme and performed annotations by referring to ‘the Balanced Corpus of Contemporary Written Japanese’ (BCCWJ). We extracted verbal and adjective event expressions as 〈EVENT〉 in a subset of BCCWJ and annotated a temporal ordering relation 〈TLINK〉 on the pairs of the above event expressions and time expressions obtained from a previous study (Konishi et al. 2013). The recognition of temporal ordering by language recipients tends to disagree compared to the normalization of time expressions. We should not regard making unique gold annotation data as an objective in such a situation. If anything, we should evaluate the degree of inter-annotator discrepancy by subjects of experiments. Then, we analysed inter-annotator discrepancies by three annotators in temporal ordering annotation. The result showed that boundaries of time segments barely exhibit any agreement, whereas the annotation of temporal relative ordering tendency exhibits good agreement by the annotators.
    Download PDF (590K)
  • Takashi Inui, Yusuke Umesawa, Mikio Yamamoto
    2013 Volume 20 Issue 5 Pages 683-705
    Published: December 13, 2013
    Released on J-STAGE: March 13, 2014
    JOURNAL FREE ACCESS
    We propose an automatic method for detecting complaint sentences from review documents. The proposed method consists of two procedures. One is a data generation procedure using sentiment lexicons and context coherence and the other is the expansion of a naive Bayes classifier based on the characteristics of the training data. This method has an advantage of not requiring human effort for the creation of large-scale training data and management of rules for complaint detection. The experimental results indicate that this method is more effective than the baseline methods.
    Download PDF (697K)
  • Hiroyuki Shinnou, Minoru Sasaki
    2013 Volume 20 Issue 5 Pages 707-726
    Published: December 13, 2013
    Released on J-STAGE: March 13, 2014
    JOURNAL FREE ACCESS
    In this paper, we propose the method of domain adaptation for word sense disambiguation (WSD). This method faces the following problems for WSD. (1) The difference between sense distributions on domains. (2) The sparseness of data caused by changing the domain. In this paper, we discuss and recommend the countermeasure for each problem. We use the k-nearest neighbor algorithm (k-NN) and the topic model for the first and second problems, respectively. In particular, we append topic features developed by the topic model for target domain corpus to to training data in source domain and test data in target domain. Using the extended features of support vector machine (SVM) classifier, we solve WSD. However, when the reliability of decision of the SVM classifier for a test instance is low, we use the decision of the k-NN. In the experiment, we select 17 ambiguous words in both domains, PB (books) and OC (Yahoo! Chie Bukuro) in the balanced corpus of contemporary written Japanese (BCCWJ corpus), which appear 50 times or more in these domains, and conduct the experiment of domain adaptation for WSD using these words to show the effectiveness of our method. In the future, we will apply the proposed method to other domains and examine a way to use the topic model considering the universality of a corpus, and an effective ensemble learning for domain adaptation for WSD.
    Download PDF (580K)
  • Toshinobu Ogiso, Mamoru Komachi, Yuji Matsumoto
    2013 Volume 20 Issue 5 Pages 727-748
    Published: December 13, 2013
    Released on J-STAGE: March 13, 2014
    JOURNAL FREE ACCESS
    To construct a richly annotated diachronic corpus of Japanese, the morphological analysis of historical Japanese text is required. However, conventional analysis of old Japanese texts with adequate accuracy is impossible. To facilitate such analyses, we extended dictionary entries from UniDic for Contemporary Japanese and prepared training corpora including articles illustrating the literary style of the Meiji Era and literature of the Heian Era, thus creating new dictionaries: “UniDic-MLJ (Modern Literary Japanese)” and “UniDic-EMJ (Early Middle Japanese).” These dictionaries achieve a high accuracy (96–97%) as that required for constructing a diachronic corpus of Japanese. Moreover, we investigated the optimal size of the training corpus for the morphological analysis of historical Japanese text on the basis of the learning curves obtained by using these dictionaries. We confirmed that a 50,000-word corpus achieves an adequate accuracy of over 95%, and even a small-sized corpus (only 5,000 words) is effective as long as the corpus is particularly constructed for the target domain.
    Download PDF (516K)
feedback
Top