抄録
Large many-core parallel applications become sensitive to communication latencies, suggesting the need for low-latency networks in high-performance computing systems. Switch delay dominates network latencies. To reduce the network latencies, we exploit routing cache on a switch. Routing decision based on off-chip CAM (Content Addressable Memory)-based table lookup imposes a significant delay, however, using on-chip small routing cache can bypass it when it hits. Our simulation results show that the only 256-entry routing cache hits 98% on over 4k-host systems with the matrix-transpose traffic, and the 1024-entry routing cache improves not only up to 16% of packet latency but also up to 18% of network throughput.