自然言語処理
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
一般論文
Auxiliary Lexicon Word Prediction for Cross-Domain Word Segmentation
Shohei HigashiyamaMasao UtiyamaYuji MatsumotoTaro WatanabeEiichiro Sumita
著者情報
ジャーナル フリー

2020 年 27 巻 3 号 p. 573-598

詳細
抄録

Recent work has explored various neural network-based methods for word segmentation and has achieved substantial progress mainly in in-domain scenarios. There remains, however, a problem of performance degradation on target domains for which labeled data is not available. A key issue in overcomming the problem is how to use linguistic resources in target domains, such as unlabeled data and lexicons, which can be collected or constructed more easily than fully-labeled data. In this work, we propose a novel method using unlabeled data and lexicons for cross-domain word segmentation. We introduce an auxiliary prediction task, Lexicon Word Prediction, into a character-based segmenter to identify occurrences of lexical entries in unlabeled sentences. The experiments demonstrate that the proposed method achieves accurate segmentation for various Japanese and Chinese domains.

著者関連情報
© 2020 The Association for Natural Language Processing
前の記事 次の記事
feedback
Top