Abstract
In this paper, an optimization for particle-based simulations on the Graphics Processing Unit (GPU) is presented. The sliced grid is a good candidate to make the search for the neighboring particles efficient because it improves the memory efficiency of the uniform grid and also improves the performance. However the previous study used graphics functions such as alpha blending, it is not clear the sliced grid is also suited for general streaming processors. So we present a general implementation of the sliced grid and an implementation using Compute Unified Device Architecture (CUDA). This paper also proposes the block transition sort that is well suited for GPU utilizing the coherency between simulation timesteps. The block transition sort is used to improve the memory alignment of the simulation data, i.e., to increase the spatial locality of the data. Distinct Element Method (DEM) is implemented to evaluate the proposed methods. We achieved about 3x speed up for the neighboring particle search which is the most expensive part of the particle-based simulation and about 1.5x speed up for the overall computation.