バイオメディカル・ファジィ・システム学会大会講演論文集
Online ISSN : 2424-2586
Print ISSN : 1345-1510
ISSN-L : 1345-1510
セッションID: 9P-E-10
会議情報

9P-E-10 A method of Web Information Extraction Based on the Length of Nodes in the Html Tree(Room E International session)
Wenli LiLechao Wang
著者情報
会議録・要旨集 フリー

詳細
抄録

The effective extraction of the information from web pages is the prerequisite to the full use of the web resources. We proposed a new method for information extraction from web pages based on the length of the nodes in the DOM tree. We will firstly represent the web page into a DOM tree using the html tags, then the content node of the tree will be identified according to the longest text node, and at last we will distinguish the body of the text block and extract the main content of the web page using the continuity of the structure of the main text content in the DOM tree. The experiment testified the accuracy and efficiency of this method.

著者関連情報
© 2010 バイオメディカル・ファジィ・システム学会
前の記事 次の記事
feedback
Top