Analyzing compound nouns is one of the crucial issues for natural language processing systems. Registering all compound nouns in a dictionary is an impractical approach, since we can create a new compound noun by combining nouns. Therefore, a mechanism to analyze the structure of a compound noun from the individual nouns is necessary. However, the analysis are difficult only when using syntactic knowledge. Therefore, we have to use semantic knowledge. It is hard to construct and maintain a large semantic knowledge base, so we need a method to acquire semantic knowledge and use such the knowledge for the analysis. In this paper, we propose a method to analyze structures of Japanese compound nouns by using word collocational information and a thesaurus. The collocational information is acquired from a corpus of four
kanzi character words. For each possible structure of a compound noun, the preference is calculated based on this collocational information. An experiment is conducted with 160, 000 word collocations to analyze compound nouns of with an average length of 5.5 characters. The accuracy of this method is about 78%.
View full abstract