Abstract
This paper presents a transpose-free variable-size fast fourier transform (FFT) accelerator on a digital signal processing (DSP) chip. Several parallel schemes are utilized to calculate a batch of small-size FFT algorithms to achieve high performance and throughput. For middle- and large-size of FFT, we propose a transpose-free Cooley-Tukey scheme that uses the random access feature of on-chip SRAM memory to avoid the DDR access of matrix with column-wise and improves the utilization of DDR bandwidth. Experimental results show that our FFT accelerator, implemented with 65 mn library and run at 500 MHz, can achieve the energy efficiency improvement by two orders of magnitude compared with Intel Xeon CPU and obtain above 50× performance improvement compared with TI TMS320C64X DSP chip.