Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 20, Issue 2
Displaying 1-10 of 10 articles from this issue
Preface
Paper
  • Hideyuki Shibuki, Takahiro Nagai, Masahiro Nakano, Madoka Ishioroshi, ...
    2013 Volume 20 Issue 2 Pages 75-103
    Published: June 14, 2013
    Released on J-STAGE: September 14, 2013
    JOURNAL FREE ACCESS
    We have been studying automatic generation of a mediatory summary for facilitating users in assessing the credibility of information on the Web. A mediatory summary is a brief description extracted from relevant Web documents in situations where a pair of statements that appear to contradict each other at first glance can actually coexist under a certain situation. In general, because there are several such pairs, users should clarify the pair whose credibility they are assessing. In this paper, we propose an interactive method for generating a mediatory summary in which users specify the pair of statements they are interested in assessing. Furthermore, we attempt to improve the method in terms of both precision and recall by introducing the position of key expressions such as adversative conjunctions, conditional expressions, and conclusive conjunctions and the number of sentences that are not useful for the mediatory summary. Results of the analysis performed using the mediatory summary corpus indicate that the proposed method achieved a precision of 0.231 for the generated summaries ranked in the top 10, while the previous method (Shibuki et al. 2011a) achieved a precision of 0.050.
    Download PDF (1294K)
  • Misako Imono, Eriko Yoshimura, Seiji Tsuchiya, Hirokazu Watabe
    2013 Volume 20 Issue 2 Pages 105-132
    Published: June 14, 2013
    Released on J-STAGE: September 14, 2013
    JOURNAL FREE ACCESS
    To smoothen the communication between robots and humans, robots must have human-like conversational abilities. Humans converse in various ways: greetings, question-answers, suggestions, and chatting, among others. Thus, methods that extract expressions from resources such as newspaper articles and embed them into the conversation template are viewed as a way to let individuals participate in active conversations, such as chat with some information by robot. However, there is a difference in the degree of difficulty of the words used in newspaper articles and the ones used in a conversation. Words in newspaper articles are generally more difficult than those used in a conversation. Hence, it is important to convert difficult words in newspaper articles to plain expressions when using newspaper articles as a resource for a robot’s conversation, in order to avoid human discomfort. This paper suggests a method of converting difficult words to plain expressions, considering the differences in the degree of difficulty of words used in newspapers and in a conversation. The proposed method aims to convert feels natural for human by combining two approaches: a method that converts one word to another and a method that converts one word to a sentence. The results show that the proposed method converts words in newspaper articles from difficult to plain expressions with an accuracy of 75.7% and converts words while retaining their meaning with an accuracy of 81.1%. Therefore, the proposed word conversion method is found to be effective.
    Download PDF (691K)
  • Akihiro Tamura, Taro Watanabe, Eiichiro Sumita, Hiroya Takamura, Manab ...
    2013 Volume 20 Issue 2 Pages 133-160
    Published: June 14, 2013
    Released on J-STAGE: September 14, 2013
    JOURNAL FREE ACCESS
    This paper proposes a novel method for bilingual lexicon extraction from comparable corpora using graph-based label propagation. A previous study found that performance drastically decreases when the coverage of a seed lexicon is small. We address this problem by using indirect relations with bilingual seeds together with direct relations, in which each word is represented by a distribution of lexical seeds. The seed distributions are propagated over a graph that represents relations among words. Translation pairs are extracted by identifying word pairs with high similarities in the seed distributions. We propose two types of graphs: (1) a co-occurrence graph, representing co-occurrence relations between words; and (2) a similarity graph, representing context similarities between words. Evaluations on comparable corpora of English and Japanese patent documents show that our proposed graph propagation method outperforms conventional methods. Further, the similarity graph improved performance by clustering synonyms into the same translation.
    Download PDF (487K)
  • Kanako Komiya, Yusuke Ito, Naoto Sato, Yoshiyuki Kotani
    2013 Volume 20 Issue 2 Pages 161-182
    Published: June 14, 2013
    Released on J-STAGE: September 14, 2013
    JOURNAL FREE ACCESS
    In this study, we proposed negation naive Bayes (NNB), a new method for text classification. Similar to complement naive Bayes (CNB), NNB uses the complement class. However, unlike CNB, NNB properly considers the prior in a mathematical way because NNB is derivable from the same equation (the maximum a posteriori equation) from which naive Bayes (NB) is derived. We carried out classification experiments on products offered on an internet auction site and on the 20 Newsgroups data set. For the latter, we carried out experiments in the following two settings and discussed the properties of NNB: (1) settings in which the number of words in each document decreases and (2) settings in which the distribution of documents over classes is skewed. We compared NNB with NB, CNB, and support vector machine (SVM). Our experiments showed that NNB outperforms other Bayesian approaches when the number of words in each document decreases and when texts are distributed non-uniformly over classes. Our experiments also showed that NNB sometimes provides the best accuracy and significantly outperforms SVM.
    Download PDF (536K)
  • Takuma Igarashi, Ryohei Sasano, Hiroya Takamura, Manabu Okumura
    2013 Volume 20 Issue 2 Pages 183-200
    Published: June 14, 2013
    Released on J-STAGE: September 14, 2013
    JOURNAL FREE ACCESS
    In linguistics, sound symbolism is an idea that the vocal sounds of certain words carry meaning in themselves. This paper focuses on the sound symbolism of onomatopoeic words and demonstrates the close relationship between sound symbolism and sentiment polarity. Because onomatopoeic words imitate the sounds they represent, they can help us better understand the sentiment of a sentence when utilizing sound symbolism. Therefore, we modeled sound symbolism with N-gram-based features and applied the model to a series of sentiment classification tasks. The experimental results show that this method with sound symbolism significantly outperformed the baseline method without sound symbolism, which effectively demonstrates that a close relationship exists between sound symbolism and sentiment polarity.
    Download PDF (383K)
  • Hikari Konishi, Masayuki Asahara, Kikuo Maekawa
    2013 Volume 20 Issue 2 Pages 201-221
    Published: June 14, 2013
    Released on J-STAGE: September 14, 2013
    JOURNAL FREE ACCESS
    Temporal information is important for grounding event expressions on a timeline. Temporal expression extraction has been performed as numerical representation extraction, which is a subtask of named entity extraction. For English texts, evaluation workshops were held in which temporal expressions were extracted and normalized. An annotation schema, TimeML, was designed to annotate events and temporal expressions, and several annotated corpora of newswire texts were developed. However, a schema for temporal information and normalization of Japanese texts has not been designed. This paper proposes an annotation schema, which is based on TimeML, for Japanese temporal information. We annotate the temporal information in parts of the ‘Balanced Corpus of Contemporary Written Japanese’. We identify several problems in the annotation and discuss the steps to be taken to ground Japanese event expressions on a timeline.
    Download PDF (592K)
  • Sanae Fujita, Hirotoshi Taira, Masaaki Nagata
    2013 Volume 20 Issue 2 Pages 223-250
    Published: June 14, 2013
    Released on J-STAGE: September 14, 2013
    JOURNAL FREE ACCESS
    The Internet is an immense resource storehouse for images. Establishing a connection between images and dictionary definitions would enable the creation of rich dictionary resources with multimedia information. Therefore, this study aims at providing several suitable images for dictionary definitions. In this study, we targeted 25,481 words, including nouns, verbs, adjectives, and adverbs, split into 39,251 senses by querying an image search engine. The results showed that 94% of word senses could be defined as suitable images. Then, we analyzed the relationship between the visualization of each word sense and parts of speech or semantic classes. To obtain images for each word sense, we expanded the query by appending queries extracted from definitions for each word sense. Second, we analyzed both manually-selected and fixed queries and examined query expansion methods more deeply. This paper proposes a method to set queries in priority order depending on the primary word sense. Third, we show the suitability of our method through two types of evaluations because in the application of new dictionaries or new target senses, it is valuable to obtain images automatically using high-priority queries.
    Download PDF (1134K)
  • Katsumasa Yoshikawa, Masayuki Asahara, Yuji Matsumoto
    2013 Volume 20 Issue 2 Pages 251-271
    Published: June 14, 2013
    Released on J-STAGE: September 14, 2013
    JOURNAL FREE ACCESS
    This paper describes a new Markov Logic approach for Japanese Predicate-Argument (PA) relation extraction. Most previous work built separated classifiers corresponding to each case role and independently identified the PA relations, neglecting dependencies (constraints) between two or more PA relations. We propose a method which collectively extracts PA relations by optimizing all argument candidates in a sentence. Our method can jointly consider dependency between multiple PA relations and find the most probable combination of predicates and their arguments in a sentence. In addition, our model involves new constraints to avoid considering inappropriate candidates for arguments and identify correct PA relations effectively. Compared to the state-of-the-art, our method achieves competitive results without large-scale data.
    Download PDF (470K)
  • Sho Takase, Naoaki Okazaki, Kentaro Inui
    2013 Volume 20 Issue 2 Pages 273-296
    Published: June 14, 2013
    Released on J-STAGE: September 14, 2013
    JOURNAL FREE ACCESS
    Most set expansion algorithms are assumed to independently acquire new instances of each of the different semantic categories even when instances of multiple semantic categories are seeded. However, in the setting of set expansion with multiple semantic categories, we might leverage other types of prior knowledge about semantic categories. In this paper, we present a method of set expansion in case ontological information related to target semantic categories is available. Specifically, the proposed method uses sibling relationships between semantic categories as an additional type of prior knowledge. We demonstrate the effectiveness of using sibling relationships in set expansion on a dataset in which instances and sibling relationships are extracted from Wikipedia in a semi-automatic manner.
    Download PDF (769K)
feedback
Top