International Journal of Networking and Computing
Online ISSN : 2185-2847
Print ISSN : 2185-2839
ISSN-L : 2185-2839
Special Issue on the Third International Symposium on Computing and Networking
A Memory-Efficient Implementation of a Plasmonics Simulation Application on SX-ACE
Raghunandan MathurHiroshi MatsuokaOsamu WatanabeAkihiro MusaRyusuke EgawaHiroaki Kobayashi
Author information
JOURNAL FREE ACCESS

2016 Volume 6 Issue 2 Pages 243-262

Details
Abstract

Since recent scientific and engineering simulations require heavy computations with large volumes of data, High-performance Computing (HPC) systems need a high computational capability with a large memory capacity. Most recent HPC systems adopt a parallel processing architecture, where the computational capability of the processors is increasing, however, the performance of the memory system is constrained. The bytes per flop (B/F), which is a ratio of the memory bandwidth to the flop/s, for the HPC systems have been reduced with the evolution of the HPC systems. To fully exploit the potential of the recent HPC systems, and to meet the increasing demand for large memory, it is necessary to optimize practical scientific and engineering applications, considering not only the parallelism of the applications, but also the limitations of the memory subsystems of the HPC systems. In this paper, we discuss a set of approaches to optimization of the memory access behavior of the applications, which enable their executions with improved performance on the recent HPC systems. Our approaches include memory optimizations through memory footprint controlling, restructuring of data structures for active elements, redundant data structure elimination through combined calculations and optimized re-calculation of data. To validate the effectiveness of our approaches, a plasmonics simulation application is evaluated on vector platforms NEC SX-ACE, NEC SX-9, and Intel Xeon based platform NEC LX 406-Re2. By applying our approaches to the implementation, the memory usage of the plasmonics simulation application can be reduced up to nearly 1/71 of the original, and its execution can be possible on a single node of a distributed parallel system with smaller memory capacity. The optimization results in 1.14 times faster execution on SX-ACE and 1.81 times faster execution on LX 406-Re2.

Content from these authors
© 2016 International Journal of Networking and Computing
Previous article Next article
feedback
Top