主催: The Japanese Society for Artificial Intelligence
会議名: 2012年度人工知能学会全国大会(第26回)
回次: 26
開催地: 山口県山口市 山口県教育会館等
開催日: 2012/06/12 - 2012/06/15
Transliteration pair extraction, which identifies transliterations corresponding to foreign loanwords in literature, is a key task and very challenging in several research fields such as historical linguistics and digital humanities. In this paper, we focus on one important type of historical literature: classical Chinese Buddhist texts. We propose an approach which can identify transliteration pairs from classical Chinese texts automatically. Our approach comprises two stages: transliteration extraction and transliteration pair identification. To extract more possible transliterations without introducing too many false positives, we adopt a hybrid method consists of a machine-learning-based extraction method with phonological features of the transliteration characters and a suffix-array-based extraction method with filtering rules. Next, the extracted transliteration candidates are compared with their phonetic similarity mutually based on the phonological pronunciation from the middle Chinese rime book "Guangyun" and then ALINE algorithm is employed to measure phonetic similarity to identify the transliteration pairs. To evaluate our method, we construct an evaluation set from several Buddhist texts such as Samyukta Agama and Mahavibhasa, which are translated into Chinese in different eras. Precision and recall are used to measure and show the effectiveness of our method.