IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Special Section on Parallel and Distributed Computing and Networking
Asymptotically Optimal Merging on ManyCore GPUs
Arne KUTZNERPok-Son KIMWon-Kwang PARK
Author information
JOURNAL FREE ACCESS

2012 Volume E95.D Issue 12 Pages 2769-2777

Details
Abstract
We propose a family of algorithms for efficiently merging on contemporary GPUs, so that each algorithm requires $O(m\log(\frac{n}{m}+1))$ element comparisons, where m and n are the sizes of the input sequences with mn. According to the lower bounds for merging all proposed algorithms are asymptotically optimal regarding the number of necessary comparisons. First we introduce a parallely structured algorithm that splits a merging problem of size 2l into 2i subproblems of size 2l-i, for some arbitrary i with (0≤il). This algorithm represents a merger for i=l but it is rather inefficient in this case. The efficiency is boosted by moving to a two stage approach where the splitting process stops at some predetermined level and transfers control to several parallely operating block-mergers. We formally prove the asymptotic optimality of the splitting process and show that for symmetrically sized inputs our approach delivers up to 4 times faster runtimes than the thrust::merge function that is part of the Thrust library. For assessing the value of our merging technique in the context of sorting we construct and evaluate a MergeSort on top of it. In the context of our benchmarking the resulting MergeSort clearly outperforms the MergeSort implementation provided by the Thrust library as well as Cederman's GPU optimized variant of QuickSort.
Content from these authors
© 2012 The Institute of Electronics, Information and Communication Engineers
Previous article Next article
feedback
Top