Abstract
The MapReduce job scheduler implemented in Hadoop is a mechanism to decide which job is allowed to use idle resources in Hadoop. In terms of the mean job response time, the performance of the job scheduler strongly depends on the job arrival pattern, which includes job size (i.e., the amount of required resources) and their arrival order. Because existing schedulers do not utilize information about job sizes, however, those schedulers suffer severe performance degradation with some arrival patterns. In this paper, we propose a scheduler that estimates and utilizes remaining job sizes, in order to achieve good performance regardless of job arrival patterns. Through simulation experiments, we confirm that for various arrival patterns, the proposed scheduler achieves better performance than the existing schedulers.