自然言語処理
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
一般論文(査読有)
Cross-Lingual Transfer Learning for End-to-End Speech Translation
Shuichiro ShimizuChenhui ChuSheng LiSadao Kurohashi
著者情報
ジャーナル フリー

2022 年 29 巻 2 号 p. 611-637

詳細
抄録

End-to-end speech translation (ST) is the task of directly translating source language speech to target language text. It has the potential to generate better translation than those obtained by simply combining automatic speech recognition (ASR) with machine translation (MT). We propose cross-lingual transfer learning for end-to-end ST, where the model parameters are transferred from the ST pretraining stage for one language pair to the ST fine-tuning stage for another language pair. Experiments on the CoVoST 2 and multilingual TEDx datasets in many-to-one settings show that our model outperforms the model that uses English ASR pretraining by up to 2.3 BLEU points. Through an ablation study investigating which layer of the sequence-to-sequence architecture contains important information to transfer, it was demonstrated that the lower layers of the encoder contain language-independent information for cross-lingual transfer. Extensive studies were conducted on (1) ASR pretraining language, (2) ST pretraining language pair, (3) multilingual methods, and (4) model sizes. It was demonstrated that (1) Using the same language as the ASR pretraining language and the ST fine-tuning source language results in good performance. (2) A high-resource language pair is a good choice for the ST pretraining language pair. (3) The proposed method works well in conjunction with multilingual methods. (4) The proposed method can operate with different model sizes.

著者関連情報
© 2022 The Association for Natural Language Processing
前の記事 次の記事
feedback
Top