IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Online ISSN : 1745-1337
Print ISSN : 0916-8508

This article has now been updated. Please use the final version.

An FPGA-based YOLOv6 Accelerator for High-Throughput and Energy-Efficient Object Detection
Xingan SHAMasao YANAGISAWAYouhua SHI
Author information
JOURNAL FREE ACCESS Advance online publication

Article ID: 2024VLP0009

Details
Abstract

Fast, accurate, and energy-efficient object detection is increasingly important for edge applications, such as Internet of Things (IoT) devices. Among various convolutional neural network (CNN)-based methods, the You-Only-Look-Once (YOLO) algorithm series is regarded as one of the promising methods for real-time object detection due to its optimal balance between speed and accuracy. However, deploying YOLO on resource and power-constrained devices like field-programmable gate arrays (FPGAs) poses significant challenges due to the high demand for multiply-and-accumulate (MAC) operations and the corresponding significant off-chip memory accesses. This paper introduces an FPGA-based accelerator for the YOLOv6 algorithm, implemented on a VC707 FPGA board with a Virtex-7 VX485T chip, achieving satisfying throughput, accuracy, and energy efficiency. To our knowledge, this is the first FPGA implementation of YOLOv6. Unlike previous works that utilized early YOLO versions, our design deploys the hardware-friendly YOLOv6, achieving a mean average precision (mAP) of 84.9% on the PASCAL VOC2007 dataset at a 352*352 resolution - significantly outperforming most existing object detection implementations. Through model optimizations for FPGA deployment, such as changing from SiLU to ReLU activation, lowering input resolution, and applying quantization-aware training, we are able to greatly reduce computational cost with minimal accuracy loss. Furthermore, these optimizations allow for the entire YOLOv6 model to be stored in on-chip memory, eliminating the need for energy-intensive DRAM access. The proposed accelerator design and the convolution lowering technique also contribute to high processing speed and energy efficiency. Experimental results demonstrate that our accelerator can process 364.5 frames per second (fps) at 150 MHz on the Virtex-7 VX485T FPGA, achieving excellent power efficiency of 19.75 fps/W.

Content from these authors
© 2024 The Institute of Electronics, Information and Communication Engineers
feedback
Top