人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
原著論文
Web Page Classification using Anchor-related Text Extracted by a DOM-based Method
Masanori OtsuboBui Quang HungYoshinori HijikataShogo Nishida
著者情報
ジャーナル フリー

2010 年 25 巻 1 号 p. 37-49

詳細
抄録
Directory services are popular among people who search their favorite information on the Web. Those services provide hierarchical categories for finding a user's favorite page. Pages on the Web are categorized into one of the categories by hand. Many existing studies classify a web page by using text in the page. Recently, some studies use text not only from a target page which they want to categorize, but also from the original pages which link to the target page. We have to narrow down the text part in the original pages, because they include many text parts that are not related to the target page. However these studies always use a unique extraction method for all pages. Although web pages usually differ so much in their formats, they do not change their extraction methods. We have already developed an extraction method of anchor-related text. We use text parts extracted by our method for classifying web pages. The results of the experiments showed that our extraction method improves the classification accuracy.
著者関連情報
© 2010 JSAI (The Japanese Society for Artificial Intelligence)
前の記事 次の記事
feedback
Top