Article ID: 22.20250590
This paper introduces a novel area-efficient asymmetric butterfly (AsymFly) network-on-chip (NoC) architecture, specifically designed to manage the communication-intensive traffic patterns generated by large language models (LLMs) on GPGPUs. The proposed architecture strategically places memory and compute nodes on opposite sides of the network fabric. This physical arrangement, combined with node consolidation, reduces the router count by 17%. Furthermore, we employ pipeline-stage time-division multiplexing (TDM) to enhance resource utilization and achieve protocol-level deadlock avoidance within a unified physical network. To counteract throughput degradation induced by TDM, we propose a local adaptive scheduling strategy that dynamically balances resource occupancy across network regions. Compared to a conventional mesh baseline, our evaluations demonstrate that AsymFly improves instructions per cycle (IPC) by 26% while reducing both area and power consumption by 64%.