Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper
Auxiliary Lexicon Word Prediction for Cross-Domain Word Segmentation
Shohei HigashiyamaMasao UtiyamaYuji MatsumotoTaro WatanabeEiichiro Sumita
Author information
JOURNAL FREE ACCESS

2020 Volume 27 Issue 3 Pages 573-598

Details
Abstract

Recent work has explored various neural network-based methods for word segmentation and has achieved substantial progress mainly in in-domain scenarios. There remains, however, a problem of performance degradation on target domains for which labeled data is not available. A key issue in overcomming the problem is how to use linguistic resources in target domains, such as unlabeled data and lexicons, which can be collected or constructed more easily than fully-labeled data. In this work, we propose a novel method using unlabeled data and lexicons for cross-domain word segmentation. We introduce an auxiliary prediction task, Lexicon Word Prediction, into a character-based segmenter to identify occurrences of lexical entries in unlabeled sentences. The experiments demonstrate that the proposed method achieves accurate segmentation for various Japanese and Chinese domains.

Content from these authors
© 2020 The Association for Natural Language Processing
Previous article Next article
feedback
Top