2016 Volume 13 Issue 23 Pages 20160937
This paper presents a floating point fused dot-product (FDP) unit with latency reduced. The proposed FDP unit performs the dot-product operation of four floating point numbers: ab ± cd and is implemented with dual-path algorithm. The proposed FDP is modeled in Verilog-HDL and synthesized using TSMC 65 nm technology library. Synthesis results show that our proposed FDP unit is 24∼30% faster and 36.4% less area than the fastest FDP in previous work. We also use the proposed FDP unit and our previously designed FAS (fused add-subtract) unit to implement a FFT Radix-2 Butterfly (R2BF) unit. The latency of our proposed R2BF unit is improved roughly by 34% and the area is reduced by 41.6%, compared to the fastest 2’s-complement butterfly unit.