Abstract
An architecture of a programmable systolic array processor is proposed for the discrete wavelet transform (DWT). This transform requires a huge amount of data to be filtered. To achieve this, many processor elements (PEs) are implemented. However, the hardware of a multiplier for multiply-accumulate operations is large, and complicated connections among PEs lower flexibility and scalability. By using the time-divided multiple-operation method, the execution unit with a simple structure of shifters and a three-input adder achieved 50% of hardware size and the same performance of that achieved with a multiplier and an adder. The unique network mechanism among PEs and the systolic array architecture provided a high level of data transfer, flexibility, and scalability. Using this architecture enables a processor with ten PEs to execute DWT for 1024×1024 image pixels in 26.3 ms.