123,123

基于HLS工具的CNN加速器的设计与优化方法研究

2021年电子技术应用第3期

程佳风，王红亮

中北大学电子测量技术国家重点实验室，山西太原030051

摘要： 基于软硬件协同设计的思想，利用HLS工具，在PYNQ-Z2平台上设计并实现了一个卷积神经网络加速器，对卷积运算采用矩阵切割的优化方法，均衡了资源消耗和计算资源，使得加速器的性能达到了最优。利用MNIST数据集对加速器IP核进行性能测试，实验结果表明：对单张图片的测试，该加速器相对于ARM平台实现了5.785的加速效果，对于1 000张图片的测试则可达到9.72的加速效果，随着测试图片数量的不断增加，加速器的性能也将越来越优。

關(guān)鍵詞： 卷积神经网络 PYNQ-Z2 HLS工具加速器

中圖分類號： TN108.1
文獻(xiàn)標(biāo)識碼： A
DOI：10.16157/j.issn.0258-7998.200841
中文引用格式： 程佳風(fēng)，王紅亮. 基于HLS工具的CNN加速器的設(shè)計(jì)與優(yōu)化方法研究[J].電子技術(shù)應(yīng)用，2021，47(3)：18-21，26.
英文引用格式： Cheng Jiafeng，Wang Hongliang. Research on the design and optimization method of CNN accelerator based on HLS tools[J]. Application of Electronic Technique，2021，47(3)：18-21，26.

Research on the design and optimization method of CNN accelerator based on HLS tools

Cheng Jiafeng，Wang Hongliang

National Key Laboratory for Electronic Measurement Technology，North University of China，Taiyuan 030051，China

Abstract： Based on the idea of software and hardware co-design, this article uses HLS tools to design and implement a convolutional neural network accelerator on the PYNQ-Z2 platform, and uses the matrix cutting optimization method for convolution operations to balance resource consumption and computing resources , so that the performance of the accelerator is optimized. This article uses the MNIST data set to test the performance of the accelerator IP core. The experimental results show that: for a single image test, the accelerator achieves an acceleration effect of 5.785 compared with the ARM platform, and an acceleration of 9.72 for a 1000 image test. As a result, as the number of test images continues to increase, the performance of the accelerator will become better and better.

Key words : convolutional neural network(CNN)；PYNQ-Z2；HLS tool；accelerator

0 引言

近年來，卷積神經(jīng)網(wǎng)絡(luò)的應(yīng)用范圍越來越廣泛，其應(yīng)用場景也日益復(fù)雜，卷積神經(jīng)網(wǎng)絡(luò)的計(jì)算密集和存儲密集特征日益凸顯，成為快速高效實(shí)現(xiàn)卷積神經(jīng)網(wǎng)絡(luò)的限制。于是基于GPU^[1]、ASIC^[2]、FPGA^[3]的不同的加速器平臺被相繼提出以提升CNN的設(shè)計(jì)性能。GPU的電力消耗巨大，硬件結(jié)構(gòu)固定，限制了卷積神經(jīng)網(wǎng)絡(luò)在嵌入式設(shè)備的應(yīng)用；ASIC開發(fā)成本極高，靈活性低，不適合搭載復(fù)雜多變的卷積神經(jīng)網(wǎng)絡(luò)；FPGA具有功耗低、性能高、靈活性好的特點(diǎn)，因此更加適用于卷積神經(jīng)網(wǎng)絡(luò)硬件加速的開發(fā)研究，但由于Verilog HDL開發(fā)門檻高，開發(fā)周期相對較長，影響了FPGA在卷積神經(jīng)網(wǎng)絡(luò)應(yīng)用的普及^[4-5]。

本文基于軟硬件協(xié)同的思想，利用HLS工具，在PYNQ-Z2上實(shí)現(xiàn)了一個(gè)卷積神經(jīng)網(wǎng)絡(luò)加速器，并采用矩陣切割的設(shè)計(jì)方法對卷積核運(yùn)算進(jìn)行優(yōu)化。

本文詳細(xì)內(nèi)容請下載:http://www.ihrv.cn/resource/share/2000003402

作者信息:

程佳風(fēng)，王紅亮

(中北大學(xué) 電子測量技術(shù)國家重點(diǎn)實(shí)驗(yàn)室，山西太原030051)

原創(chuàng)聲明：此內(nèi)容為AET網(wǎng)站原創(chuàng)，未經(jīng)授權(quán)禁止轉(zhuǎn)載。

相關(guān)內(nèi)容