2024 年 31 巻 2 号 p. 374-406
k nearest neighbor machine translation (kNN-MT) (Khandelwal et al. 2021) boosts the translation quality of trained neural machine translation (NMT) models by incorporating an example search into the decoding algorithm. However, decoding is seriously time-consuming, that is, roughly 100 to 1,000 times slower than that of standard NMT, because neighbor tokens are retrieved from all the target tokens of parallel data in each timestep. In this paper, we propose “Subset kNN-MT”, which improves the decoding speed of kNN-MT using two methods: (1) retrieving neighbor target tokens from a subset that is the set of neighbor sentences of the input sentence, not from all sentences, and (2) efficient distance computation technique suitable for subset neighbor search using a look-up table. Our subset kNN-MT achieved a speed-up of up to 134.2 times and an improvement in the BLEU score of up to 1.6 compared with those of kNN-MT in the WMT’19 De-En translation task, domain adaptation tasks in De-En and En-Ja translations, and the Flores101 multilingual translation task.