We have built a Japanese large-scale general ontology restructured from Wikipedia, that represents a
is-a relation hierarchy. A Wikipedia’s article page belongs to one or more categories that are organized hierarchically by linking to others. However, there are the following two issues to be solved in order to use the categories and the articles as
is-a ontology: (1) The higher levels of the hierarchy seems to be too abstract so that it cannot be applied directly into an ontology. (2) There are many
not-is-a links seen in the articles, because of low-quality descriptions that may happen in consumer-generated media. In order to solve these, we (1) redefine the highest level and replace them to the original category, and (2) cut
not-is-a links between categories and category-to-articles. Experimental results show that the accuracy of is-a links between categories is 95.3% precision and 96.6% recall, while that of is-a links between a category and the article is 96.2% and 95.6% respectively. The accuracies significantly outperform the previous methods. We extracted 84.5% categories (approximately 34,000) and 88.6% articles (approximately 420,000) in Wikipedia.
View full abstract