A variety of satellite missions are carried out every year. Most of the satellites yield big data, and high-performance data processing technologies are expected. We have been developing a cloud system (the NICT Science Cloud) for big data analyses of Earth and Space observations via spacecraft. In the present study, we propose a new technique to process big data considering the fact that high-speed I/O (data file read and write) is important compared with data processing speed. We adopt a task scheduler, the Pwrake, for easy development and management of parallel data processing. Using a set of long-time scientific satellite observation data (GEOTAIL satellite), we examine the performance of the system on the NICT Science Cloud. We successfully archived high-speed data processing more than 100 times faster than those on traditional data processing environments.
View full abstract