Journal of the Japanese Society for Artificial Intelligence
Online ISSN : 2435-8614
Print ISSN : 2188-2266
Print ISSN:0912-8085 until 2013
Automatic Extraction of Auxiliary Phrases from a Corpus
Hiroyuki SHINNOUHitoshi ISAHARA
Author information
MAGAZINE FREE ACCESS

1995 Volume 10 Issue 3 Pages 429-435

Details
Abstract

In this paper, we describe a method to automatically extract Japanese auxiliary phrases from a corpus. The auxiliary phrase is a kind of idiomatic expression corresponding to auxiliary verb or postpositional particle. Typical examples are "にかんして" and "なければならない". Generally it is advantageous to handle the auxiliary phrase as one word. Therefore, building a dictionary, we need bring together auxiliary phrases like standard words. However, it is difficult to pick up auxiliary phrases. Because it is unclear to distinguish them from normal phrases. Thoroughly investigating the difference, it is defined by subjectivity of system developer. Therefore, it needs vast time to select auxiliary phrases, and there must be considerable doubt that phrases collected comprise all necessary phrases, and have uniformity. To overcome this problem, we present this method. The point of our method is to utilize the following heuristics that a auxiliary phrase has : (H1) The auxiliary phrase is consist of HIRAGANA characters. Even if KANJI character is found in it, its length is 1. (H2) Characters in front and behind of the auxiliary phrase are a certain confined characters. (H3) Each word composed the auxiliary phrase are strongly connected. Firstly, we pick up all phrases whose length is N from the corpus, however, the phrase is consist of HIRAGANA characters and KANJI characters whose length are 1. For all N(≥4), we carry out above operation. In view of (H1), all auxiliary phrases must exist in the set of phrases acquired by these operations. Then, using (H2) and (H3), we remove not auxiliary phrases from this set. Last, we remove duplicate phrases by investigating whether there is a longer phrase included the phrase. As the result, we can acquire phrases to aim in this paper. This method has a merit to easily carry out under poor environment. We made experiment on this method with ASAHI newspaper articles for one month (about 9 Mbyte). We report this result, too.

Content from these authors
© 1995 The Japaense Society for Artificial Intelligence
Previous article Next article
feedback
Top