2025 Volume E108.D Issue 6 Pages 558-569
Graphics processing units (GPUs) have been introduced in various fields due to their high parallel computing performance. A key feature of GPUs is multi-threaded execution, where a GPU executes many threads simultaneously to hide various latencies. However, even with such multi-threaded execution, there is a limit to the number of threads that can be launched, and long latency instructions eventually stall the GPU core. While long latencies can be hidden by out-of-order execution, it requires expensive circuits such as rename logic and load-store queues and is not typically introduced on GPUs with massively multi-threaded execution. We propose the TURBULENCE architecture for very low-cost out-of-order execution on GPUs. TURBULENCE consists of a novel ISA that introduces the concept of referencing operands by inter-instruction distance instead of register numbers, and a novel microarchitecture that executes the novel ISA. This distance-based operand has the property of not causing false dependencies. By exploiting this property, we achieve complexity-effective out-of-order execution on GPUs without introducing any expensive hardware. Simulation results show that TURBULENCE improves performance by 20.4% while reducing energy consumption over an existing GPU.