Abstract
This paper presents a novel design of large shift registers to overcome the problem of I/O pin bottleneck typically encountered in FPGA implementation. The proposed design uses an embedded logic recursively to decompose and synthesize the shifter operations. Compared to the conventional logic shifter, barrel shifter, and logarithmic shifter designs, the proposed approach reduces the number of I/O pins by at least 89%, and increases the available logic slices by 215%. The performance of the design is optimized by using a hybrid clocking scheme and is easily extended to multi-chip FPGA implementation.