Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 19, Issue 4
Displaying 1-5 of 5 articles from this issue
Preface
Paper
  • Yumi Shibaki, Masaaki Nagata, Kazuhide Yamamoto
    2012 Volume 19 Issue 4 Pages 229-279
    Published: December 14, 2012
    Released on J-STAGE: March 19, 2013
    JOURNAL FREE ACCESS
    We have built a Japanese large-scale general ontology restructured from Wikipedia, that represents a is-a relation hierarchy. A Wikipedia’s article page belongs to one or more categories that are organized hierarchically by linking to others. However, there are the following two issues to be solved in order to use the categories and the articles as is-a ontology: (1) The higher levels of the hierarchy seems to be too abstract so that it cannot be applied directly into an ontology. (2) There are many not-is-a links seen in the articles, because of low-quality descriptions that may happen in consumer-generated media. In order to solve these, we (1) redefine the highest level and replace them to the original category, and (2) cut not-is-a links between categories and category-to-articles. Experimental results show that the accuracy of is-a links between categories is 95.3% precision and 96.6% recall, while that of is-a links between a category and the article is 96.2% and 95.6% respectively. The accuracies significantly outperform the previous methods. We extracted 84.5% categories (approximately 34,000) and 88.6% articles (approximately 420,000) in Wikipedia.
    Download PDF (2426K)
  • Akira Fujita, Hiroshi Fujita, Naoyoshi Tamura
    2012 Volume 19 Issue 4 Pages 281-301
    Published: December 14, 2012
    Released on J-STAGE: March 19, 2013
    JOURNAL FREE ACCESS
    We propose a method to learn an individual model, which is to evaluate Japanese Compositions via Support Vector Regression, based on features along Japanese education and scores, marked by human in advance. We also propose a method to represent a way of evaluation. Features in training data of SVR are categorized as 7 types according to what each features refer to. The features include some features regarding criterions of Japanese compositions in education. Besides, all the features do not depend on topic of a composition’s prompt. Our methods implemented to score an integrated point of a composition automatically, and also to account elements considered by individual evaluator, to quantify weights of the each elements that contributes decision of scores.
    Download PDF (471K)
  • Hiroyuki Shinnou, Minoru Sasaki
    2012 Volume 19 Issue 4 Pages 303-327
    Published: December 14, 2012
    Released on J-STAGE: March 19, 2013
    JOURNAL FREE ACCESS
    In this paper, we propose a method to detect new word senses of a target word from sentences that contain it. To achieve this, we assume a new word sense sentence as an outlier of a data set constructed by sentences that contain the target word. Then using outlier detection methods in the data mining domain, we detect the new word senses. Generally, outlier detection methods are considered to be unsupervised. However, our method utilises data sets including some sentences with the labelled target word. Therefore, our outlier detection method is classified under the supervised framework. We propose an ensemble method of two methods to detect new word sense sentences: the supervised LOF (Local Outlier Factor) and the supervised generative model. The final output is the intersection of outputs of both methods. We demonstrate the effectiveness of our method using SemEval-2 Japanese WSD task data. Moreover we show that word sense disambiguation systems cannot solve our task by themselves.
    Download PDF (559K)
  • Hiromitsu Nishizaki, Tomoyosi Akiba, Kiyoaki Aikawa, Tatsuya Kawahara, ...
    2012 Volume 19 Issue 4 Pages 329-350
    Published: December 14, 2012
    Released on J-STAGE: March 19, 2013
    JOURNAL FREE ACCESS
    This paper describes a design of spoken term detection (STD) studies and their evaluating framework at the STD sub-task of the NTCIR-9 IR for Spoken Documents (SpokenDoc) task. STD is the one of information access technologies for spoken documents. The goal of the STD sub-task is to rapidly detect presence of a given query term, consisting of word or a few word sequences spoken, from the spoken documents included in the Corpus of Spontaneous Japanese. To successfully complete the sub-task, we considered the design of the sub-task and the evaluation methods, and arranged the task schedule. Finally, seven teams participated in the STD sub-task and submitted 18 STD results. This paper explains the STD sub-task details we conducted, the data used in the sub-task, how to make transcriptions by speech recognition for data distribution, the evaluation measurement, introduction of the participants’ techniques, and the evaluation results of the task participants.
    Download PDF (636K)
feedback
Top