Knowledge Distillation for Translating Erroneous Speech Transcriptions

Ryo Fukuda; Katsuhito Sudoh; Satoshi Nakamura

doi:10.5715/jnlp.29.344

抄録

Recent studies consider knowledge distillation as a promising method for speech translation (ST) using end-to-end models. However, its usefulness in cascade ST with automatic speech recognition (ASR) and machine translation (MT) models has not yet been clarified. An ASR output typically contains speech recognition errors. An MT model trained only on human transcripts performs poorly on error-containing ASR results. Thus, it should be trained considering the presence of ASR errors during inference. In this paper, we propose using knowledge distillation for training of the MT model for cascade ST to achieve robustness against ASR errors. We distilled knowledge from a teacher model based on human transcripts to a student model based on erroneous transcriptions. Our experimental results showed that the proposed method improves the translation performance on erroneous transcriptions. Further investigation by combining knowledge distillation and fine-tuning consistently improved the performance on two different datasets: MuST-C English--Italian and Fisher Spanish--English.

著者関連情報

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

お気に入り & アラート

閲覧履歴

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）