Journal of Natural Language Processing

Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619

Search
Browse

All issues

Volume 19, Issue 2

Displaying 1-3 of 3 articles from this issue

Download citation

Show all abstracts

Preface

[title in Japanese]

[in Japanese]

2012Volume 19Issue 2 Pages 63
Published: July 06, 2012
Released on J-STAGE: October 05, 2012

DOIhttps://doi.org/10.5715/jnlp.19.63

JOURNAL FREE ACCESS

Download PDF (141K)

Paper

Splitting Katakana Noun Compounds by Paraphrasing and Back-transliteration

Nobuhiro Kaji, Masaru Kitsuregawa

2012Volume 19Issue 2 Pages 65-88
Published: July 06, 2012
Released on J-STAGE: October 05, 2012

DOIhttps://doi.org/10.5715/jnlp.19.65

JOURNAL FREE ACCESS

Show abstractHide abstract

Word boundaries within noun compounds are not marked by white spaces in a number of languages including Japanese, and it is beneficial for various NLP applications to split such noun compounds. In the case of Japanese, noun compounds made up of katakana words are particularly difficult to split, because katakana words are highly productive and are often out-of-vocabulary. To overcome this difficulty, we propose using paraphrases and back-transliteration of katakana noun compounds for splitting them. Experiments demonstrated that splitting accuracy is improved with a statistical significance by extracting both paraphrases and back-transliterations from unlabeled textual data, and then using that information for constructing splitting models.

View full abstract

Download PDF (573K)
Entity Set Expansion based on Bootstrapping Methods using Topic Information

Kugatsu Sadamitsu, Kuniko Saito, Kenji Imamura, Yoshihiro Matsuo, Geni ...

2012Volume 19Issue 2 Pages 89-106
Published: July 06, 2012
Released on J-STAGE: October 05, 2012

DOIhttps://doi.org/10.5715/jnlp.19.89

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes three modules based on latent topics of documents for alleviating “semantic drift” in bootstrapping entity set expansion. These new modules are added to a discriminative bootstrapping algorithm to realize topic feature generation, negative example selection and positive example disambiguation. In this study, we model latent topics with LDA (Latent Dirichlet Allocation) in an unsupervised way. Experiments show that the accuracy of the extracted entities is improved by 6.7 to 28.2% depending on the domain.

View full abstract

Download PDF (468K)

feedback

Top

Register with J-STAGE for free!