In order to breakthrough the limitation of the conventional method based on Compositional Semantics, it is expected to realize a new translation method based on Sentence Patterns in which non-linear structures of linguistic expressions are represented as semantic units. This paper proposes the way to judge the linearity or non-linearity of linguistic expressions based on their definitions and how to generate sentence patterns from huge bilingual corpora. According to this method, three kinds of sentence patterns such as “word level”, “phrase level” and “clause level” are generated in this order from Japanese to English corpus. In the experiments, 150, 000 sentence pairs for complex and compound sentences are extracted from one million sentence pair corpora, and 128, 000 patterns, 105, 000 patterns and 13, 000 patterns for each of three revels were generated from these sentence pairs. Due to the clarifications of decision process, the generation processes of the sentence patterns were mostly automated by using the results of morphological analysis and these 246, 000 sentence patterns have been obtained in a year.
View full abstract