自然言語処理
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
一般論文(査読有)
DiverSeg: Leveraging Diverse Segmentations with Cross-granularity Alignment for Neural Machine Translation
Haiyue SongZhuoyuan MaoRaj DabreChenhui ChuSadao Kurohashi
著者情報
ジャーナル フリー

2024 年 31 巻 1 号 p. 155-188

詳細
抄録

In this study, we proposed DiverSeg to exploit diverse segmentations from multiple subword segmenters that capture the various perspectives of each word for neural machine translation. In DiverSeg, multiple segmentations are encoded using a subword lattice input, a subword-relation-aware attention mechanism integrates relations among subwords, and a cross-granularity embedding alignment objective enhances the similarity across different segmentations of a word. We conducted experiments on five datasets to evaluate the effectiveness of DiverSeg in improving machine translation quality. The results demonstrate that DiverSeg outperforms baseline methods by approximately two BLEU points. Additionally, we performed ablation studies to investigate the improvement over non-subword methods, the contribution of each component of DiverSeg, the choice of subword relations, the choice of similarity metrics in alignment loss, and combinations of segmenters.

著者関連情報
© 2024 The Association for Natural Language Processing
前の記事 次の記事
feedback
Top