A Scalable 1L2D Multi-Core Near-DRAM Computing Accelerator Based on 3D Hybrid Bonding for AI Models

Zhongze Han; Yue Cao; Xuanzhi Liu; Jinhui Cheng; Jianguo Yang

doi:10.1587/transele.2025ECP5020

抄録

Many memory-bound AI applications, including natural language processing, transformer-based visual recognition, and multi-task online inference, rely heavily on large-scale general matrix-vector multiplication (GEMV), which is characterized by strong data locality. However, existing hardware architectures for AI model inference face significant data transfer overheads and fail to fully exploit the data locality inherent in these algorithms. We propose a scalable one-logic-two-DRAM (1L2D) multi-core near-DRAM computing accelerator based on 3D hybrid bonding for AI models. Our 3D integration of RISC-V processors with vector accelerators and DRAM presents a unique approach that significantly boosts bandwidth while reducing energy consumption. A memory access circuit supporting page hit mechanism and prefetching strategy is designed to maximize the utilization of the data locality achieved by the algorithm's partitioning and rearranging of data. An interleaving memory address mapping scheme is designed to effectively enhance the bank-level parallelism of data access. Compared with the high-performance Intel Xeon-6230 CPU and the state-of-the-art commercially available UPMEM-PIM, the proposed architecture's computational efficiency for large-scale GEMV is improved by 3.4× and 2.2×, respectively. The architecture achieves a 3.07× improvement in bandwidth and a 76% reduction in energy consumption over the HBM2-PIM.

著者関連情報

お気に入り & アラート

閲覧履歴

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）