2024 Volume 21 Issue 22 Pages 20240559
This paper introduces a novel hardware acceleration circuit designed to address the storage address offset issue in Convolutional Neural Networks (CNNs) during the feature map padding process. Traditional CPU-based padding and data transfer methods are computationally intensive and lead to high latency and power consumption, especially on edge devices. Our solution automates and integrates feature map padding and transfer. This significantly reduces DRAM access and improves the speed of transferring feature maps between DRAM and on-chip SRAM. The proposed circuit, tested on the ZCU102 development board using YOLOv4-tiny’s convolutional layers, demonstrates a speedup of over 20 times compared to CPU-based methods and more than 4 times compared to CPU with DMA.