Journal of Advanced Computational Intelligence and Intelligent Informatics
Online ISSN : 1883-8014
Print ISSN : 1343-0130
ISSN-L : 1883-8014
Regular Papers
An Improved Byte Pair Encoding Method for Tibetan
Kalzang GyatsoSonam TsheringTashi NorbuNyima Tashi Tong XiaoJingbo ZhuGarma TashiGaden Luosang
Author information
JOURNAL OPEN ACCESS

2025 Volume 29 Issue 6 Pages 1273-1282

Details
Abstract

Byte pair encoding (BPE) plays a crucial role in natural language processing tasks by effectively reducing vocabulary redundancy and alleviating the out-of-vocabulary problem. However, when applied to Tibetan language tasks, the standard BPE method fails to fully exploit its advantages due to the unique characteristics of the Tibetan script. As a result, some subwords in the vocabulary that violate standard Tibetan orthographic conventions, introduce noise into the model and degrade downstream task performance. To address this issue, this paper investigates the agglutinative nature of Tibetan words and proposes an improved BPE approach specifically designed for Tibetan. We apply the method to a Tibetan-Chinese machine translation system and evaluate its effectiveness through a series of experiments. The results demonstrate that the proposed method not only corrects malformed subwords and enhances translation quality, but also significantly reduces vocabulary size, laying a solid foundation for future research in Tibetan word representation and downstream natural language processing applications. Our method achieves consistent improvements in BLEU scores across most test sets, with gains exceeding 2 points in the best case.

Content from these authors

This article cannot obtain the latest cited-by information.

© 2025 Fuji Technology Press Ltd.

This article is licensed under a Creative Commons [Attribution-NoDerivatives 4.0 International] license (https://creativecommons.org/licenses/by-nd/4.0/).
The journal is fully Open Access under Creative Commons licenses and all articles are free to access at JACIII official website.
https://www.fujipress.jp/jaciii/jc-about/#https://creativecommons.org/licenses/by-nd
Previous article Next article
feedback
Top