We present an implementation of a GPU based FFT routine (Graphics Processing Unit based Fast Fourier Transformation) into a CPU based
ab initio periodic DFT (Density Functional Theory) calculation code. The FFT calculation in the CPU based DFT codes is the most time-consuming part; for the 128 silicon system, the fraction of time of a CPU FFT calculation amounts to 0.64 of the whole periodic DFT calculation. The replacement of a double precision FFT in the periodic PWscf code with a single precision FFT gives no appreciable differences in both the numerical total energies and the interatomic forces, guaranteeing the use of a single precision GPU based FFT, CUFFT, for the code. The use of the CUFFT reduces the fraction to 0.20 of the whole PWscf code; the replacement speedups a factor of 2.2 for single CPU system. The use of the multi-CPU system with the GPU FFT accelerates by 2.2
f, where
f is the acceleration factor of the multi-CPU system. The single precision GPU calculation is implementable in any self-consistent electronic structure code, except for the eigensolver part in the DFT codes.
View full abstract