Abstract
When applications exhibit more complex control flow behaviors, wide single-instruction multiple-data (SIMD) architecture is inefficient, which mainly due to two aspects, vector condition branch and nested loop. To solve this problem, this paper proposes two independent ideas: the data-aware thread-level parallelism (DATLP) and hardware-supported software pipeline scheduling policy (HSSP). They share the same hardware which is the instruction buffer queue (IBQ), to improve the efficiency by increasing Instruction-Level Parallelism (ILP) and Thread-Level Parallelism (TLP). Compared with the traditional SIMD-architecture, the proposed control-enhanced power SIMD, will get an average performance improvement by 84% for a wide variety of media and 4G wireless communication applications, while the area overhead only increases by 2.97%.