Discourse processing has been widely recognized as a key technology for improving the accuracy of text analysis, but it has also been considered a high cost procedure that requires an enormous amount of background knowledge and deep inference mechanisms. However, without constructing a precise model of the discourse through deep semantic analysis, rich information for resolving ambiguities in sentence analysis, including various discourse-dependent problems, can be obtained by analyzing a simple set of parsed trees of each sentence in a text. For example, if it is assumed that morphologically identical words within a discourse have the same word sense and modify or are modified by similar words, the results of word sense and attachment disambiguation applied in one sentence can be shared with all other morphologically identical words within the discourse. Besides facilitating information on word sense and attachment disambiguation, processing a whole text at one time makes it possible to refer to other information in the discourse, such as word frequency and the position of each word, which can be used for resolving pronoun reference and the focus of focusing subjuncts, such as also and only, as well as for adding supplementary phrases in some elliptical sentences. We have developed a method of sentence analysis based on a simple discourse model that improves the accuracy of a natural language processing system, in particular, a machine translation system. Our framework is highly practical, since it does not require any knowledge resources that have been specially hand-coded for discourse processing, or a deep inference mechanism; instead, it uses syntactic information on all the other words in the discourse, such as modifiee-modifier relationships and position in the text. Moreover, our approach is fundamentally different from previous approaches to discourse processing, in that it does not consider any discourse structure and is aimed at improving the accuracy of natural language processing rather than obtaining a perfect analysis. In this paper, we describe our robust discourse processing method, focusing on its effect in a machine translation system.
抄録全体を表示