Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
DiverSeg: Leveraging Diverse Segmentations with Cross-granularity Alignment for Neural Machine Translation
Haiyue SongZhuoyuan MaoRaj DabreChenhui ChuSadao Kurohashi
Author information
JOURNAL FREE ACCESS

2024 Volume 31 Issue 1 Pages 155-188

Details
Abstract

In this study, we proposed DiverSeg to exploit diverse segmentations from multiple subword segmenters that capture the various perspectives of each word for neural machine translation. In DiverSeg, multiple segmentations are encoded using a subword lattice input, a subword-relation-aware attention mechanism integrates relations among subwords, and a cross-granularity embedding alignment objective enhances the similarity across different segmentations of a word. We conducted experiments on five datasets to evaluate the effectiveness of DiverSeg in improving machine translation quality. The results demonstrate that DiverSeg outperforms baseline methods by approximately two BLEU points. Additionally, we performed ablation studies to investigate the improvement over non-subword methods, the contribution of each component of DiverSeg, the choice of subword relations, the choice of similarity metrics in alignment loss, and combinations of segmenters.

Content from these authors
© 2024 The Association for Natural Language Processing
Previous article Next article
feedback
Top