Journal of Natural Language Processing

Preface

[title in Japanese]

[in Japanese]

2012Volume 19Issue 4 Pages 227-228
Published: December 14, 2012
Released on J-STAGE: March 19, 2013

DOIhttps://doi.org/10.5715/jnlp.19.227

JOURNAL FREE ACCESS

Download PDF (107K)

Paper

Constructing Large-Scale General Ontology from Wikipedia

Yumi Shibaki, Masaaki Nagata, Kazuhide Yamamoto

2012Volume 19Issue 4 Pages 229-279
Published: December 14, 2012
Released on J-STAGE: March 19, 2013

DOIhttps://doi.org/10.5715/jnlp.19.229

JOURNAL FREE ACCESS

Show abstractHide abstract

We have built a Japanese large-scale general ontology restructured from Wikipedia, that represents a is-a relation hierarchy. A Wikipedia’s article page belongs to one or more categories that are organized hierarchically by linking to others. However, there are the following two issues to be solved in order to use the categories and the articles as is-a ontology: (1) The higher levels of the hierarchy seems to be too abstract so that it cannot be applied directly into an ontology. (2) There are many not-is-a links seen in the articles, because of low-quality descriptions that may happen in consumer-generated media. In order to solve these, we (1) redefine the highest level and replace them to the original category, and (2) cut not-is-a links between categories and category-to-articles. Experimental results show that the accuracy of is-a links between categories is 95.3% precision and 96.6% recall, while that of is-a links between a category and the article is 96.2% and 95.6% respectively. The accuracies significantly outperform the previous methods. We extracted 84.5% categories (approximately 34,000) and 88.6% articles (approximately 420,000) in Wikipedia.

View full abstract

Download PDF (2426K)
Automated Evaluation of Japanese Compositions based on Features along Japanese Education and Construction of the Individual Evaluation Model

Akira Fujita, Hiroshi Fujita, Naoyoshi Tamura

2012Volume 19Issue 4 Pages 281-301
Published: December 14, 2012
Released on J-STAGE: March 19, 2013

DOIhttps://doi.org/10.5715/jnlp.19.281

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a method to learn an individual model, which is to evaluate Japanese Compositions via Support Vector Regression, based on features along Japanese education and scores, marked by human in advance. We also propose a method to represent a way of evaluation. Features in training data of SVR are categorized as 7 types according to what each features refer to. The features include some features regarding criterions of Japanese compositions in education. Besides, all the features do not depend on topic of a composition’s prompt. Our methods implemented to score an integrated point of a composition automatically, and also to account elements considered by individual evaluator, to quantify weights of the each elements that contributes decision of scores.

View full abstract

Download PDF (471K)
Detection of New Word Senses by the Outlier Detection Method

Hiroyuki Shinnou, Minoru Sasaki

2012Volume 19Issue 4 Pages 303-327
Published: December 14, 2012
Released on J-STAGE: March 19, 2013

DOIhttps://doi.org/10.5715/jnlp.19.303

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose a method to detect new word senses of a target word from sentences that contain it. To achieve this, we assume a new word sense sentence as an outlier of a data set constructed by sentences that contain the target word. Then using outlier detection methods in the data mining domain, we detect the new word senses. Generally, outlier detection methods are considered to be unsupervised. However, our method utilises data sets including some sentences with the labelled target word. Therefore, our outlier detection method is classified under the supervised framework. We propose an ensemble method of two methods to detect new word sense sentences: the supervised LOF (Local Outlier Factor) and the supervised generative model. The final output is the intersection of outputs of both methods. We demonstrate the effectiveness of our method using SemEval-2 Japanese WSD task data. Moreover we show that word sense disambiguation systems cannot solve our task by themselves.

View full abstract

Download PDF (559K)
Evaluation Framework Design of Spoken Term Detection Study at the NTCIR-9 IR for Spoken Documents Task

Hiromitsu Nishizaki, Tomoyosi Akiba, Kiyoaki Aikawa, Tatsuya Kawahara, ...

2012Volume 19Issue 4 Pages 329-350
Published: December 14, 2012
Released on J-STAGE: March 19, 2013

DOIhttps://doi.org/10.5715/jnlp.19.329

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper describes a design of spoken term detection (STD) studies and their evaluating framework at the STD sub-task of the NTCIR-9 IR for Spoken Documents (SpokenDoc) task. STD is the one of information access technologies for spoken documents. The goal of the STD sub-task is to rapidly detect presence of a given query term, consisting of word or a few word sequences spoken, from the spoken documents included in the Corpus of Spontaneous Japanese. To successfully complete the sub-task, we considered the design of the sub-task and the evaluation methods, and arranged the task schedule. Finally, seven teams participated in the STD sub-task and submitted 18 STD results. This paper explains the STD sub-task details we conducted, the data used in the sub-task, how to make transcriptions by speech recognition for data distribution, the evaluation measurement, introduction of the participants’ techniques, and the evaluation results of the task participants.

View full abstract

Download PDF (636K)

Register with J-STAGE for free!