An Informative DOM Subtree Identification Method from Web Pages in Unfamiliar Web Sites

Masanobu TSURUTA; Hiroyuki SAKAI; Shigeru MASUYAMA

doi:10.1093/ietisy/e91-d.4.986

IEICE Transactions on Information and Systems

Online ISSN : 1745-1361
Print ISSN : 0916-8532

Special Section on Knowledge-Based Software Engineering

An Informative DOM Subtree Identification Method from Web Pages in Unfamiliar Web Sites

Masanobu TSURUTA, Hiroyuki SAKAI, Shigeru MASUYAMA

Author information

Keywords: informative region identification, Web document, DOM, layout analysis

JOURNAL FREE ACCESS

2008 Volume E91.D Issue 4 Pages 986-989

DOI https://doi.org/10.1093/ietisy/e91-d.4.986

Details

Abstract

We propose a method of informative DOM* subtree identification from a Web page in an unfamiliar Web site. Our method uses layout data of DOM nodes generated by a generic Web browser. The results show that our method outperforms a baseline method, and was able to identify informative DOM subtrees from Web pages robustly.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!