Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper
Character-to-Word Attention for Word Segmentation
Shohei HigashiyamaMasao UtiyamaEiichiro SumitaMasao IdeuchiYoshiaki OidaYohei SakamotoIsaac OkadaYuji Matsumoto
Author information
JOURNAL FREE ACCESS

2020 Volume 27 Issue 3 Pages 499-530

Details
Abstract

Although limited effort has been devoted to exploring neural models in Japanese word segmentation, much effort has been actively applied to Chinese word segmentation because of the ability to minimize effort in feature engineering. In this work, we propose a character-based neural model that makes joint use of word information useful for disambiguating word boundaries. For each character in a sentence, our model uses an attention mechanism to estimate the importance of multiple candidate words that contain the character. Experimental results show that learning attention to proper words leads to accurate segmentations and that our model achieves better performance than existing statistical and neural models on both in-domain and cross-domain Japanese word segmentation datasets.

Content from these authors
© 2020 The Association for Natural Language Processing
Previous article Next article
feedback
Top