Information and Media Technologies
Online ISSN : 1881-0896
ISSN-L : 1881-0896
Media (processing) and Interaction
Web Page Classification using Anchor-related Text Extracted by a DOM-based Method
Masanori OtsuboBui Quang HungYoshinori HijikataShogo Nishida
著者情報
ジャーナル フリー

2010 年 5 巻 1 号 p. 193-205

詳細
抄録

Directory services are popular among people who search their favorite information on the Web. Those services provide hierarchical categories for finding a user's favorite page. Pages on the Web are categorized into one of the categories by hand. Many existing studies classify a web page by using text in the page. Recently, some studies use text not only from a target page which they want to categorize, but also from the original pages which link to the target page. We have to narrow down the text part in the original pages, because they include many text parts that are not related to the target page. However these studies always use a unique extraction method for all pages. Although web pages usually differ so much in their formats, they do not change their extraction methods. We have already developed an extraction method of anchor-related text. We use text parts extracted by our method for classifying web pages. The results of the experiments showed that our extraction method improves the classification accuracy.

著者関連情報
© 2010 by Japanese Society for Artificial Intelligence
前の記事 次の記事
feedback
Top