Abstract
A decision-tree learning approach to ellipsis resolution that appears in Japanese dialogue is proposed. The method has a high flexibility since it only requires any dialogue corpus with part-of-speech and ellipsis taggings and any thesaurus. We provide three kinds of attributes to machine learning: semantic category of content words, functional words, and exophoric information. By the open tests against Japanese dialogue corpus in the topic of travel arrangement, it is proven that the proposed method performs satisfactory resolution accuracy against ‘ga’ (subject) and ‘ni’ (indirect object) cases. The following findings are also obtained in the discussion: (1) learning topic should be restricted if it is expected, otherwise widely-learned decision tree may perform best.(2) as a decision tree learns more, it tends to use more general attributes such as functional words. The problem of data size relative to decision-tree training is also discussed and found that resolution accuracy of proposed method may be saturated in 104-105 samples. Resolution accuracy of 80%-85%is expected at the highest.