2009 Volume E92.A Issue 9 Pages 2283-2294
To achieve an automated implementation for the application-specific heterogeneous multiprocessor systems-on-chip (MPSoC), partitioning and mapping the sequential programs onto multiple parallel processors is one of the most difficult challenges. However, the existing traditional parallelizing techniques cannot solve the MPSoC-related problems effectively, so designers are still required to manually extract the concurrency potentials in the program. To solve this bottleneck, an automated application partition technique is needed. However, completely automatic parallelism is ineffective, so it is promising to explore concurrency for certain practical special structures. To settle those issues, this paper proposes a template-based algorithm to automatically partition a special load-compute-store (LCS) loop structure. Since specific-instruction customization for the application specific instruction-set processors (ASIPs) has interactions with task partitioning, the proposed algorithm integrates the dynamic pipelining and ASIP techniques using an iterative improvement strategy: first, an initial pipelining scheme is generated to obtain the maximum parallelism; second, under the primary partition results specific instructions are customized respectively for each subprogram; third, the program is repartitioned via pipelining under the specific instruction configurations. The proposed method has been implemented in the context of a commercial extensible multiprocessor design flow, using the Xtensa-based XTMP platform from Tensilica Inc. Based on a case study of Fast Fourier Transform (FFT), the experimental results indicate that the partitioned programs by the proposed method demonstrate an average speedup of 10× compared to the original sequential programs which have not been partitioned and run on the uniprocessor system.