Information and Media Technologies
Online ISSN : 1881-0896
ISSN-L : 1881-0896
Media (processing) and Interaction
Splitting Katakana Noun Compounds by Paraphrasing and Back-transliteration
Nobuhiro KajiMasaru Kitsuregawa
Author information
JOURNAL FREE ACCESS

2014 Volume 9 Issue 4 Pages 790-813

Details
Abstract
Word boundaries within noun compounds in a number of languages, including Japanese, are not marked by white spaces. Thus, it is beneficial for various NLP applications to split such noun compounds. In the case of Japanese, noun compounds composed of katakana words are particularly difficult to split because katakana words are highly productive and are often out of vocabulary. Therefore, we propose using paraphrasing and back-transliteration of katakana noun compounds to split them. Experiments in which paraphrases and back-transliterations from unlabeled textual data were extracted and used to construct splitting models improved splitting accuracy with statistical significance.
Content from these authors
© 2014 The Association for Natural Language Processing
Previous article Next article
feedback
Top