Abstract
This paper presents evaluation results for two GPU implementations that aim at accelerating mathematical morphology. Our methods decompose the computation into independent pieces in order to achieve parallelization on the GPU. We have two different implementations, namely the scatter-based and gather-based methods, each employing one of two decomposition schemes. The scatter-based method parallelizes pieces, each associated with a pixel in the structuring element. On the other hand, the gather-based method decomposes the computation according to pixels in the original image. The latter method is an extended version of Eidheim's method, which is capable of dealing with arbitrary structuring elements. The experimental results show that the scatter-based method is seven times faster than a naive CPU implementation for a structuring element of 100 x 100 pixels. The gather-based method achieves the highest performance but fails to deal with structuring elements larger than 40 x 40 pixels, due to the limitation on the number of instruction slots.