Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Quantitative Analysis of Corpora with Different Source Languages
KYONGHEE PAIKKIYONORI OHTAKEBOND FRANCISKAZUHIDE YAMAMOTO
Author information
JOURNAL FREE ACCESS

2005 Volume 12 Issue 4 Pages 117-136

Details
Abstract
In order to investigate the effect of source language on translations, we examine two variants of a Korean translation corpus.The first variant consists of Korean translations of 162, 308 Japanese sentences from the ATR BTEC (Basic Expression Text Corpus).The second variant was made by translating the English translations of the Japanese sentences into Korean.We show that the source language text has a large influence on the target text.Even after normalizing orthographic differences, fewer than 8.3% of the sentences in the two variants were identical.We describe in general which phenomena differ and then discuss how our analysis can be used in natural language processing.
Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top