2010 年 20 巻 2 号 p. 125-131
Latest GPUs have not only high computation power but also high memory bandwidth required to accelerate memory intensive computations like FFT. This paper presents a high performance FFT library for CUDA GPUs. It is important to use auto-tuning to exploit the best performance. As a result, the library achieved much higher than other existing libraries.