In Machine Translation (MT), using compound words or phrases often makes the translation process easier. For example, the phrase _??_ corresponds unambiguously to “infbrmation highway”. It is not necessary to break it down to _??_ (infbrmation), _??_ (highspeed) and _??_ (road). However, some compound words (phrases) in Chinese are composed of simpler words which can play significantly different roles in sentences when they are broken down. For example, thecompoundword _?? (machine translation) may be broken into _??_ (machine) and _??_ (translate), as in the sentence _??_ (He uses a machine to translate papers). We call such a compound word “Sensitive Word”. During Chinese MI processing, if the first segmentation result in which a sensitive word is segmentec as a single word leads to a failure, the alternative solution with the sensitive worc broken down is considered as the preferred one. This allows us to reach at a higher efficiency by avoiding examining unlikely segmentation solutions. In this paper, we describe the problems related to sensitive words. A machine readable dictionary has been examined, and 764 sensitive words have been found among 87 600 words. this shows that sensitive word is a common phenomenon in Chinese that is worth closer examination.
View full abstract