Journal of Information Processing
Online ISSN : 1882-6652
ISSN-L : 1882-6652
 
Single-precision Matrix Multiplication Performance on Cerebras CS-2: Evaluation and Modelling of Performance, Scalability and Energy Efficiency
Takaaki MiyajimaRyunosuke MatsuzakiDaichi Mukunoki
著者情報
ジャーナル フリー

2026 年 34 巻 p. 132-139

詳細
抄録

Although recent supercomputers have been improving their computational performance, achieving performance scaling with respect to the number of nodes is not easy due to long inter-node communication latency. Many attempts have been made to hide communication latency and maintain strong scalability even for dense matrix multiplication. Matrix multiplication is an ideal candidate for benchmarking the performance of supercomputers. The Cerebras CS-2 system is an accelerator for deep learning with the world's largest chip, the wafer-scale engine 2 (WSE-2). The WSE-2 can be considered a distributed memory system that comes with 745, 500 processing elements connected in a low-latency 2-D mesh topology. This paper presents the effective maximum performance, weak and strong scaling performance, and proposes a performance model for single-precision matrix multiplication on the CS-2. We observed the maximum performance of 349.0TFlops/s (matrix size: 33,000×33,000, used PEs: 750×750), performance per watt of 79.66GFlops/W, and a weak scaling efficiency of 1.00. The mean absolute percentage error between our performance model and the actual measurement was 9.2%.

著者関連情報
© 2026 by the Information Processing Society of Japan
前の記事 次の記事
feedback
Top