Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Extracting Word Sequence Correspondences Based on Support Vector Machines
KENGO SATOHIROAKI SAITO
Author information
JOURNAL FREE ACCESS

2003 Volume 10 Issue 4 Pages 109-124

Details
Abstract
This paper proposes a learning and extracting method of bilingual word sequence correspondences from aligned parallel corpora based on Support Vector Machines (SVMs), which are robust against data sparseness because of high ability of generalization and can learn dependencies of features by using a kernel function. Our method learns a translation model using features such as translation dictionaries, the number of words, part-of-speech, constituent words and neighbor words, and extracts bilingual word sequence correspondences by using the correspondence level based on SVMs. Conventional methods cannot extract bilingual word sequence correspondences which appear infrequently because of data sparseness which is caused by correspondence levels based on word co-occurrences. Our method, however, can extract them by the model which has been already learned by training corpora.
Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top