Splitting Katakana Noun Compounds by Paraphrasing and Back-transliteration

Nobuhiro Kaji; Masaru Kitsuregawa

doi:10.11185/imt.9.790

抄録

Word boundaries within noun compounds in a number of languages, including Japanese, are not marked by white spaces. Thus, it is beneficial for various NLP applications to split such noun compounds. In the case of Japanese, noun compounds composed of katakana words are particularly difficult to split because katakana words are highly productive and are often out of vocabulary. Therefore, we propose using paraphrasing and back-transliteration of katakana noun compounds to split them. Experiments in which paraphrases and back-transliterations from unlabeled textual data were extracted and used to construct splitting models improved splitting accuracy with statistical significance.

著者関連情報

お気に入り & アラート

お気に入りに追加
追加情報アラート
被引用アラート
認証解除アラート

閲覧履歴

血液透析患者の無症候性心筋虚血とpulse wave velocity
職場における熱中症の現状と予防対策
Improvement of Corrosion Resistance of Magnesium Alloys by Surface Film with Rare Earth Element
Effects of Dietary Trans Fatty Acids on Fat Accumulation and Metabolic Rate in Rat
Isolation and Identification of Aucuparin as a Phytoalexin from Eriobotrya japonica L.

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）