LMAI2C: Low Memory Access Im2col Method for CNN Inference

Mengda Li; Ziyi Chen; Jiangen Hong; Yiheng Zhang; Xiaoran Hao; Ming Chen; Mao Ni

doi:10.1587/elex.22.20250246

Mengda Li, Ziyi Chen, Jiangen Hong, Yiheng Zhang, Xiaoran Hao, Ming Chen, Mao Ni

Author information

Keywords: Im2col, CNN, hardware acceleration, FPGA

JOURNAL FREE ACCESS Advance online publication

Article ID: 22.20250246

DOI https://doi.org/10.1587/elex.22.20250246

Details

Abstract

For neural network accelerators with General Matrix Multiplication (GEMM) as the computational core, the input feature maps of convolution must be converted into 2D matrices through the Im2col operation. Conventional approaches utilize CPUs to execute Im2col management and data transfer operations. Conventional methods suffer from memory expansion due to redundant data in overlapping convolutional windows, thus incurring non-negligible memory access energy consumption and transmission latency overheads. This severely limits the feasibility of efficient GEMM acceleration in resource-constrained edge devices. This paper proposes a novel Low Memory Access Im2col Method (LMAI2C) and present its dedicated hardware implementation. By restructuring data from overlapping convolutional windows, LMAI2C significantly reduces DRAM memory access volume while improving feature map transfer efficiency. Experimental results on convolutional layers of the YOLOv4-tiny network demonstrate that LMAI2C reduces DDR memory access by approximately 79.8% compared to traditional methods. Furthermore, LMAI2C demonstrates an average speedup of 69 times compared to CPU-based methodologies and 43 times over DMA-accelerated CPU implementations.

Corresponding author

Register with J-STAGE for free!