人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
原著論文
大規模分析のための木構造データ処理プラットフォーム
柳井 孝介植田 良一佐川 暢俊
著者情報
ジャーナル フリー

2011 年 26 巻 5 号 p. 594-606

詳細
抄録

We propose a data processing platform that can analyze a large amount of tree-structured data. The proposed platform stores tree-structured data in separated files corresponding to each attribute, and uses MapReduce framework for distributed computing. These methods enable to reduce disk I/O load, and to avoid computationally-intensive processing, such as grouping or combining of records. An early stage of data mining needs try-and-error processes to find out how to analyze and utilize the data. Our platform speeds up computations of the try-and-error processes, such as appending new attributes and calculating statistics of attributes. Experimental results show that the proposed methods are efficient to process large-scale tree-structure data, and our platform is comparable or superior to a traditional relational database system. With the proposed platform, it became possible to process 90 GB data within 5 minutes on 6 benchmark tasks. We also describe system architecture for the try-and-error phase, which integrates the proposed platform and a few Web applications. The main contributions of this paper are: (1) formulation of vertical partitioning for tree-structured data, (2) effective utilization of MapReduce, and (3) construction of large-scale data mining system for a try-and-error phase.

著者関連情報
© 2011 JSAI (The Japanese Society for Artificial Intelligence)
前の記事 次の記事
feedback
Top