Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 19, Issue 2
Displaying 1-3 of 3 articles from this issue
Preface
Paper
  • Nobuhiro Kaji, Masaru Kitsuregawa
    2012 Volume 19 Issue 2 Pages 65-88
    Published: July 06, 2012
    Released on J-STAGE: October 05, 2012
    JOURNAL FREE ACCESS
    Word boundaries within noun compounds are not marked by white spaces in a number of languages including Japanese, and it is beneficial for various NLP applications to split such noun compounds. In the case of Japanese, noun compounds made up of katakana words are particularly difficult to split, because katakana words are highly productive and are often out-of-vocabulary. To overcome this difficulty, we propose using paraphrases and back-transliteration of katakana noun compounds for splitting them. Experiments demonstrated that splitting accuracy is improved with a statistical significance by extracting both paraphrases and back-transliterations from unlabeled textual data, and then using that information for constructing splitting models.
    Download PDF (573K)
  • Kugatsu Sadamitsu, Kuniko Saito, Kenji Imamura, Yoshihiro Matsuo, Geni ...
    2012 Volume 19 Issue 2 Pages 89-106
    Published: July 06, 2012
    Released on J-STAGE: October 05, 2012
    JOURNAL FREE ACCESS
    This paper proposes three modules based on latent topics of documents for alleviating “semantic drift” in bootstrapping entity set expansion. These new modules are added to a discriminative bootstrapping algorithm to realize topic feature generation, negative example selection and positive example disambiguation. In this study, we model latent topics with LDA (Latent Dirichlet Allocation) in an unsupervised way. Experiments show that the accuracy of the extracted entities is improved by 6.7 to 28.2% depending on the domain.
    Download PDF (468K)
feedback
Top