Information and Media Technologies
Online ISSN : 1881-0896
ISSN-L : 1881-0896
Media (processing) and Interaction
Applying XML Element Retrieval Techniques to Web Documents
Atsushi KEYAKIJun MIYAZAKIKenji HATANO
著者情報
ジャーナル フリー

2015 年 10 巻 2 号 p. 344-350

詳細
抄録

In this paper, we propose a method to expand XML element retrieval techniques into Web documents. XML element retrieval techniques return partial (sub) documents as search results, and are expected to be able to apply to other structured documents, namely, Web documents besides XML documents. The point is that physical document structures of Web documents are literally disorganized because Web documents are generated for not managing data but rendering on a Web browser. As another feature of Web documents, they contain many incomprehensive contents for human readers. To address challenges caused by these features, we propose 1) a reconstruction method of document structures according to logical structures of contents and 2) a filter for removing unimportant content which does not convey useful information to users. Our experimental evaluations showed that our proposed method improved search accuracy compared with both naive XML element retrieval approach and document retrieval approach.

著者関連情報
© 2015 The Database Society of Japan
前の記事 次の記事
feedback
Top