自然言語処理
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
一般論文(査読有)
End-to-end Japanese-English Speech-to-text Translation with Spoken-to-Written Style Conversion
Zhengdong YangShuichiro ShimizuChenhui ChuSheng LiSadao Kurohashi
著者情報
ジャーナル フリー

2024 年 31 巻 3 号 p. 935-957

詳細
抄録

Speech-to-text translation (ST) translates speech from the source language into text in the target language. Because ST deals with different forms of language, it faces a language style gap between spoken and written language. The gap lies not only between the input speech and the output text but also between the input speech and the bilingual parallel corpora that are often used in ST. These gaps become an obstacle to improving the performance of ST. Spoken-to-written style conversion has been proven to improve cascaded Japanese-English ST by reducing such gaps. Integrating this conversion into end-to-end ST is desirable because of its ease of deployment, improved efficiency, and reduced error propagation compared to cascaded ST. In this study, we construct a large-scale Japanese-English lecture domain ST dataset. We also propose a joint task of speech-to-text spoken-to-written style conversion and end-to-end ST, as well as an interactive-attention-based multi-decoder model for the joint task to improve end-to-end ST. Experiments on the constructed dataset show that our model outperforms a strong baseline.

著者関連情報
© 2024 The Association for Natural Language Processing
前の記事 次の記事
feedback
Top