International Journal of Networking and Computing
Online ISSN : 2185-2847
Print ISSN : 2185-2839
ISSN-L : 2185-2839
Special Issue on Workshop on Advances in Parallel and Distributed Computational Models 2013
Implementations of the Hough Transform on the Embedded Multicore Processors
Xin ZhouNorihiro TomagouYasuaki ItoKoji Nakano
Author information
JOURNAL FREE ACCESS

2014 Volume 4 Issue 1 Pages 174-188

Details
Abstract

Embedded multicore processors represented by FPGAs and GPUs have lately attracted considerable attention for their potential computation ability and power consumption. Recent FPGAs have hundreds of embedded DSP slices and block RAMs. For example, Xilinx Virtex-6 Family FPGAs have a DSP48E1 slice, which is a configurable logic block equipped with fast multipliers, adders, pipeline registers, and so on. They also have a dual-port memory with 18Kbits as a block RAM. Meanwhile, recent GPUs can be used for general purpose computation. Users can develop parallel programs running on GPUs using programming architecture called CUDA provided by NVIDIA. The main contribution of this paper is to present two implementations of the Hough transform on the FPGA and the GPU. The first idea of the implementations is an efficient usage of DSP slices and block RAMs for FPGAs, and the shared memory for GPUs. The second idea is to partition the voting space in the Hough transform and the voting operation is performed in parallel. The implementation results show that the Hough transform for a 512×512 image with 33232 edge points can be done in 135.75μs and 637.88μs on the FPGA and the GPU, respectively. On the other hand, a conventional CPU implementation runs in 37.10ms. Thus, both implementations achieve a sufficient speed-up.

Content from these authors
© 2014 International Journal of Networking and Computing
Previous article Next article
feedback
Top