IEICE Transactions on Communications
Online ISSN : 1745-1345
Print ISSN : 0916-8516
Special Section on Management for the Era of Internet of Things and Big Data
Reusing the Results of Queries in MapReduce Systems by Adopting Shared Storage
Zhanye WANGChuanyi LIUDongsheng WANG
著者情報
ジャーナル 認証あり

2016 年 E99.B 巻 2 号 p. 315-325

詳細
抄録
Over the last few years, Apache MapReduce has become the prevailing framework for large scale data processing. Instead of writing MapReduce programs which are too obscure to express, many developers usually adopt high level query languages, such as Hive or Pig Latin, to finish their complex queries. These languages automatically compile each query into a workflow of MapReduce jobs, so they greatly facilitate the querying and management of large datasets. One option to speed up the execution of workflows is to save the results produced previously and reuse them in the future if needed. In this paper we present SuperRack, which uses shared storage devices to store the results of each workflow and allows a new query to reuse these results in order to avoid redundant computation and hasten execution. We propose several novel techniques to improve the access and storage efficiency of the previous results. We also evaluate SuperRack to verify its feasibility and effectiveness. Experiments show that our solution outperforms Hive significantly under TPC-H benchmark and real life workloads.
著者関連情報
© 2016 The Institute of Electronics, Information and Communication Engineers
前の記事 次の記事
feedback
Top