An Efficient Execution Mechanism on a GPU for Fine-Grained Parallel Programs With the Fork-Join Model

Kosuke Kiuchi; Yudai Tanabe; Hidehiko Masuhara

doi:10.2197/ipsjjip.33.840

Abstract

General purpose computing on graphics processing units (GPGPU) has an execution model in which the number and type of parallel tasks are managed by the CPU, making it difficult to execute fine-grained parallel programs efficiently with nested parallel tasks at a nonhomogeneous granularity. This work addresses this problem by efficiently executing fine-grained parallel programs by managing parallel tasks on the GPU using a fast memory allocation mechanism. As a preliminary implementation, this work proposes a method for splitting the computation in a fine-grained parallel fork-join program at the fork point and allocating each computation to the GPU memory as a parallel task. In addition, kernel fusion, parallel task reuse, and parallel throttling are explored as optimization methods for the proposed method. This work implements a fine-grained parallel fork-join program in CUDA and investigates its scalability and execution speed to evaluate the feasibility and performance of the proposed method.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!