Abstract
This paper proposes a learning and extracting method of bilingual word sequence correspondences from aligned parallel corpora based on Support Vector Machines (SVMs), which are robust against data sparseness because of high ability of generalization and can learn dependencies of features by using a kernel function. Our method learns a translation model using features such as translation dictionaries, the number of words, part-of-speech, constituent words and neighbor words, and extracts bilingual word sequence correspondences by using the correspondence level based on SVMs. Conventional methods cannot extract bilingual word sequence correspondences which appear infrequently because of data sparseness which is caused by correspondence levels based on word co-occurrences. Our method, however, can extract them by the model which has been already learned by training corpora.