Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
26th (2012)
Session ID : 3M2-IOS-3b-4
Conference information

Extracting Transliteration Pairs from Classical Chinese Buddhist Literature
*Yu-Chun WangRichard Tzong-Han Tsai
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Transliteration pair extraction, which identifies transliterations corresponding to foreign loanwords in literature, is a key task and very challenging in several research fields such as historical linguistics and digital humanities. In this paper, we focus on one important type of historical literature: classical Chinese Buddhist texts. We propose an approach which can identify transliteration pairs from classical Chinese texts automatically. Our approach comprises two stages: transliteration extraction and transliteration pair identification. To extract more possible transliterations without introducing too many false positives, we adopt a hybrid method consists of a machine-learning-based extraction method with phonological features of the transliteration characters and a suffix-array-based extraction method with filtering rules. Next, the extracted transliteration candidates are compared with their phonetic similarity mutually based on the phonological pronunciation from the middle Chinese rime book "Guangyun" and then ALINE algorithm is employed to measure phonetic similarity to identify the transliteration pairs. To evaluate our method, we construct an evaluation set from several Buddhist texts such as Samyukta Agama and Mahavibhasa, which are translated into Chinese in different eras. Precision and recall are used to measure and show the effectiveness of our method.

Content from these authors
© 2012 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top