Article ID: 2024LHP0001
With the growth of deep learning and machine learning applications, an efficient processing element array (PEA) has become increasingly important. To address this need, this paper introduces a quantized bit-serial PEA, which improves data reusability by integrating a weight ring (WR) dataflow mechanism and increases operation frequency through the use of bit-serial circuits. This design substantially reduces the number of feature map accesses, thereby optimizing data processing efficiency. A key aspect of our approach is the use of quantization techniques. By converting floating-point values to signed 8-bit fixed-point numbers, we reduce computational complexity and ease memory bandwidth pressure. We briefly discuss that ignoring bias terms may not impact model inference accuracy when the appropriate neural network type and dataset are chosen. Our proposed WR dataflow, inspired by the weight stationary (WS) dataflow, only updates the outdated row with a new row. This not only boosts data reuse rates but also diminishes costly data access operations. Notably, the 3×3 WR PEA requires 38.54% of the off-chip accesses per second as compared to the 3×3 WS PEA and merely 11.25% compared to its no local reuse (NLR) PEA counterpart. Empirical results show its excellent trade-off between area, power, and speed, ensuring robust data reuse efficiency. By combining quantization and WR dataflow, our high-reuse, quantized bit-serial PEA offers a fresh perspective on deep learning hardware design.