Subset Retrieval Nearest Neighbor Machine Translation

Hiroyuki Deguchi; Taro Watanabe; Yusuke Matsui; Masao Utiyama; Hideki Tanaka; Eiichiro Sumita

doi:10.5715/jnlp.31.374

抄録

k nearest neighbor machine translation (kNN-MT) (Khandelwal et al. 2021) boosts the translation quality of trained neural machine translation (NMT) models by incorporating an example search into the decoding algorithm. However, decoding is seriously time-consuming, that is, roughly 100 to 1,000 times slower than that of standard NMT, because neighbor tokens are retrieved from all the target tokens of parallel data in each timestep. In this paper, we propose “Subset kNN-MT”, which improves the decoding speed of kNN-MT using two methods: (1) retrieving neighbor target tokens from a subset that is the set of neighbor sentences of the input sentence, not from all sentences, and (2) efficient distance computation technique suitable for subset neighbor search using a look-up table. Our subset kNN-MT achieved a speed-up of up to 134.2 times and an improvement in the BLEU score of up to 1.6 compared with those of kNN-MT in the WMT’19 De-En translation task, domain adaptation tasks in De-En and En-Ja translations, and the Flores101 multilingual translation task.

著者関連情報

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

お気に入り & アラート

閲覧履歴

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）