We propose a corpus-based approach to anaphora resolution of Japanese pronouns combining a machine learning method and statistical information. First, a decision tree trained on an annotated corpus determines the coreference relation of a given anaphor and antecedent candidates and is utilized as a filter in order to reduce the number of potential candidates. In the second step, preference selection is achieved by taking into account the frequency information of coreferential and non-referential pairs tagged in the training corpus as well as distance and counting features within the current discourse.
We propose an effective method for resolving overlapping ambiguities found in sentential analyses of Chinese. It detects the ambiguities by a FBMM scanner, resolves them by using the relevancy value (RV), a statistical measure for word co-occurrences taken from textual data on the Internet, and selects the correct word sequence for the sentence being analyzed. We use contextual information also when RVs are considered not sufficient to resolving the ambiguities and choosing the correct word sequence. An experiment for selecting the desired sequences shows a success rate of about 85%. This result is convincing and far better than those in other comparable studies.