Abstract
General purpose computing on GPU (GPGPU) has become a popular computing model for high-performance, data-intensive applications. Accordingly, there is a strong need to develop highly efficient data structures to ease the development of GPGPU applications. In this work, we proposed an efficient concurrent queue data structure for GPU computing. The GPU based provably correct, lock-free FIFO queue allows a massive number of concurrent producers and consumers. Warp-centric en-queue and de-queue procedures are introduced to better match the underlying Single-Instruction, Multiple-Thread execution model of modern GPUs. It outperforms the best previous GPU queues by up to 40 fold. The correctness of the proposed queue operations is formally validated by linearizability criteria.