Extracting Transliteration Pairs from Classical Chinese Buddhist Literature

Yu-Chun Wang; Richard Tzong-Han Tsai

doi:10.11517/pjsai.JSAI2012.0_3M2IOS3b4

Abstract

Transliteration pair extraction, which identifies transliterations corresponding to foreign loanwords in literature, is a key task and very challenging in several research fields such as historical linguistics and digital humanities. In this paper, we focus on one important type of historical literature: classical Chinese Buddhist texts. We propose an approach which can identify transliteration pairs from classical Chinese texts automatically. Our approach comprises two stages: transliteration extraction and transliteration pair identification. To extract more possible transliterations without introducing too many false positives, we adopt a hybrid method consists of a machine-learning-based extraction method with phonological features of the transliteration characters and a suffix-array-based extraction method with filtering rules. Next, the extracted transliteration candidates are compared with their phonetic similarity mutually based on the phonological pronunciation from the middle Chinese rime book "Guangyun" and then ALINE algorithm is employed to measure phonetic similarity to identify the transliteration pairs. To evaluate our method, we construct an evaluation set from several Buddhist texts such as Samyukta Agama and Mahavibhasa, which are translated into Chinese in different eras. Precision and recall are used to measure and show the effectiveness of our method.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!