Journal of Natural Language Processing

Preface

[title in Japanese]

[in Japanese]

2012 Volume 19 Issue 3 Pages 119
Published: September 30, 2012
Released on J-STAGE: December 26, 2012

DOIhttps://doi.org/10.5715/jnlp.19.119

JOURNAL FREE ACCESS

Download PDF (93K)

Paper

Relevance Feedback using Surface and Latent Information in Texts

Jun Harashima, Sadao Kurohashi

2012 Volume 19 Issue 3 Pages 121-142
Published: September 30, 2012
Released on J-STAGE: December 26, 2012

DOIhttps://doi.org/10.5715/jnlp.19.121

JOURNAL FREE ACCESS

Show abstractHide abstract

Most of the previous relevance feedback methods re-rank search results using only the information of surface words in texts. In this paper, we present a novel method that uses not only the information of surface words, but also that of latent words that are highly probable from the texts. In the proposed method, we infer the latent word distribution in each document in the search results using latent Dirichlet allocation (LDA). When user feedback is given, we also infer the latent word distribution in the feedback using LDA. We calculate the similarities between the user feedback and each document in the search results using both the surface and latent word distributions, and then, we re-rank the search results based on the similarities. Evaluation results show that when user feedback that consists of two documents (3,589 words) is given, our method improves the initial search results by 27.6% in precision at 10 (P@10). Additionally, it proves that our method has the advantage of performing well even when only a small amount of user feedback is available (e.g., improvement of 5.3% in P@10 was achieved even when user feedback constituted only 57 words).

View full abstract

Download PDF (667K)
Automatic Selection of Domain Adaptation Method for WSD using Decision Tree Learning

Kanako Komiya, Manabu Okumura

2012 Volume 19 Issue 3 Pages 143-166
Published: September 30, 2012
Released on J-STAGE: December 26, 2012

DOIhttps://doi.org/10.5715/jnlp.19.143

JOURNAL FREE ACCESS

Show abstractHide abstract

Domain adaptation (DA), which involves adapting a classifier developed from source to target data, has been studied intensively in recent years. However, when DA for word sense disambiguation (WSD) was carried out, the optimal DA method varied according to the properties of the source and target data. This paper describes how the optimal method for DA was determined depending on these properties using decision tree learning given a triple of the target word type of WSD, the source domain, and the target domain and discusses what properties affected the determination of the best method when Japanese WSD was performed.

View full abstract

Download PDF (415K)
A Pointwise Approach to Training Dependency Parsers from Partially Annotated Corpora

Daniel Flannery, Yusuke Miyao, Graham Neubig, Shinsuke Mori

2012 Volume 19 Issue 3 Pages 167-191
Published: September 30, 2012
Released on J-STAGE: December 26, 2012

DOIhttps://doi.org/10.5715/jnlp.19.167

JOURNAL FREE ACCESS

Show abstractHide abstract

We introduce a word-based dependency parser for Japanese that can be trained from partially annotated corpora, allowing for effective use of available linguistic resources and reduction of the costs of preparing new training data. This is especially important for domain adaptation in a real-world situation. We use a pointwise approach where each edge in the dependency tree for a sentence is estimated independently. Experiments on Japanese dependency parsing show that this approach allows for rapid training and achieves accuracy comparable to state-of-the-art dependency parsers trained on fully annotated data.

View full abstract

Download PDF (272K)
Automatic Generation of Article Correspondence Tables for the Comparison of Local Government Statutes

Yoichi Takenaka, Takeshi Wakao

2012 Volume 19 Issue 3 Pages 193-212
Published: September 30, 2012
Released on J-STAGE: December 26, 2012

DOIhttps://doi.org/10.5715/jnlp.19.193

JOURNAL FREE ACCESS

Show abstractHide abstract

Local governments establish ordinances and regulations (hereinafter collectively referred to as “statutes”). They are structured documents that possess a chapter >article >paragraph >item hierarchy. Since each local government establishes statutes in its councils independently, similar statutes on the same matter are often found in separate local governments (e.g. punishment for obscene habits). In legal education, legal research and legal works at local government and business enterprise, comparisons are made to clarify the differences between similar statutes. In the comparison of laws for practical purposes, article correspondence tables are normally created with pairs of corresponding articles aligned horizontally or vertically. The objective of our research is to use a computer to automatically generate the article correspondence tables that are currently created manually. In order to accomplish this objective, we have focused on the relationships between articles in article correspondence tables, which were modeled with directed bipartite graphs that used each article as a node. 96 methods based on the vector space model, longest common subsequence and sequence alignment were examined in order to clarify effective methods for searching for corresponding articles. In the course of the research, we automatically generated article correspondence tables of 22 statutes in total (11 statutes of Ehime and Kagawa Prefectures, respectively). Their accuracy rates were calculated based on article correspondence tables created by legal scholars. Consequently, the vector space model-based method proved the highest accuracy rate at 85%. Its targets were nouns, adverbs, adjectives, verbs and attributives. The sequence alignment-based method showed up to 81% of accuracy rate, while the rate with the longest common subsequence method was 75%. As the results of the computer-generated article correspondence tables are checked by legal scholars on a practical level, it is required to posess the degree of reliability for each relationships between articles. To meet the requirement, we examined two measurements for the three methods by receiver operating charasteristic curve. The results shows the ratio of the selected relations and the runner-up gives 0.80 AUC for longest common subsequences. In this research, the problem was defined by focusing on the correspondence relationship between articles in the article correspondence tables. For practical purposes, there is a need to focus not just on the correspondence relationship between articles, but also on the clarification of different words used in corresponding articles. Since vector space model cannot be used to clarify such differences, sequence alignment—with which it is feasible to clarify differing texts—is necessary. Composite methods that combine those two will therefore be required in the future.

View full abstract

Download PDF (623K)

Register with J-STAGE for free!