Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Paper
Splitting Katakana Noun Compounds by Paraphrasing and Back-transliteration
Nobuhiro KajiMasaru Kitsuregawa
Author information
JOURNAL FREE ACCESS

2012 Volume 19 Issue 2 Pages 65-88

Details
Abstract

Word boundaries within noun compounds are not marked by white spaces in a number of languages including Japanese, and it is beneficial for various NLP applications to split such noun compounds. In the case of Japanese, noun compounds made up of katakana words are particularly difficult to split, because katakana words are highly productive and are often out-of-vocabulary. To overcome this difficulty, we propose using paraphrases and back-transliteration of katakana noun compounds for splitting them. Experiments demonstrated that splitting accuracy is improved with a statistical significance by extracting both paraphrases and back-transliterations from unlabeled textual data, and then using that information for constructing splitting models.

Content from these authors
© 2012 The Association for Natural Language Processing
Previous article Next article
feedback
Top