Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
Joint Optimization of Word Segmentation and Downstream Model using Downstream Loss
Tatsuya HiraokaSho TakaseKei UchiumiAtsushi KeyakiNaoaki Okazaki
Author information
JOURNAL FREE ACCESS

2022 Volume 29 Issue 1 Pages 112-143

Details
Abstract

We propose a novel method to find an appropriate tokenization for a given downstream model by jointly optimizing a tokenizer and the model. The proposed method has no restriction except for using loss values computed by the downstream model to train the tokenizer, and thus, we can apply the proposed method to various NLP task. Moreover, the proposed method can explore the appropriate tokenization to improve the performance for an already trained model as post-processing. Therefore, the proposed method is applicable to various situations. We evaluated whether our method contributes to improving performance on text classification in three languages and machine translation in eight language pairs. Experimental results show that our proposed method improves the performance by determining appropriate tokenizations.

Content from these authors
© 2022 The Association for Natural Language Processing
Previous article Next article
feedback
Top