OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)
-
Upload
industrial-technology-research-institute-itri- -
Category
Technology
-
view
1.624 -
download
8
Transcript of OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)
![Page 1: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/1.jpg)
©SIProp Project, 2006-2008 1
OpenCV acceleration battle:
OpenCL on Firefly-RK3288(MALI-T764) vs.
FPGA on ZedBoard(Zynq-7020)
Noritsuna Imamura
![Page 2: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/2.jpg)
©SIProp Project, 2006-2008 2
Agenda
OpenCV for OpenCL
OpenCV for FPGA
![Page 3: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/3.jpg)
©SIProp Project, 2006-2008 3
!!!!!!ATTENTION!!!!!!
This Slide is NOT
OpenCL for GPU vs. FPGA
![Page 4: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/4.jpg)
©SIProp Project, 2006-2008 4
OpenCL for FPGA
Today’s Agenda
OpenCV for OpenCL
OpenCV for FPGA
![Page 5: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/5.jpg)
©SIProp Project, 2006-2008 5
OpenCL SDK for FPGA
Altera SDK for OpenCL
http://www.altera.com/products/software/opencl/opencl-index.html
Xilinx SDAccel
http://www.xilinx.com/products/design-tools/sdx/sdaccel.html
![Page 6: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/6.jpg)
©SIProp Project, 2006-2008 6
Advantage of FPGA
Direct connect Peripherals to FPGA.
GPGPU must bypass CPU/Memory bus.
Ex. Network peripheral
![Page 7: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/7.jpg)
©SIProp Project, 2006-2008 7
About OpenCL for FPGA
Programming Language for FPGA
OpenCL Compiler for RTL(Register Transfer Level)
OpenCL Runtime Library
Why for usage?
For “Software Engineers”
Easy to Program for FPGA
![Page 8: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/8.jpg)
©SIProp Project, 2006-2008 8
Advantage of OpenCL for FPGA
High-Level Synthesis
C/C++/Java/Python for HDL
FPGA features
No MemoryFPGA has SRAM/SDRAM. But small size & big overhead.
Parallel ProcessingInput number is Parallel number.
![Page 9: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/9.jpg)
©SIProp Project, 2006-2008 9
Streaming Processing for No Memory
Ex. Effect(pixel by pixel)
![Page 10: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/10.jpg)
©SIProp Project, 2006-2008 10
Pipeline Processing for Parallel
Ex. OpenGL Architecture
![Page 11: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/11.jpg)
©SIProp Project, 2006-2008 11
OpenCV for OpenCL
![Page 12: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/12.jpg)
©SIProp Project, 2006-2008 12
About OpenCL for OpenCV 1/2
OpenCL Functions for OpenCV
cv::ocl::xxxWrapped Functions as OpenCV Functions
1. #include <opencv2/ocl/ocl.hpp>
2. int main(int argc, char** argv) {
3. cv::Mat matIn = cv::imread("hoge.png"), matDisp;
4. cv::ocl::oclMat oclIn(matIn), oclOut;
5. cv::ocl::cvtColor(oclIn, oclOut, cv::COLOR_BGR2GRAY);
6. oclOut.download(matDisp);
7. return 0;
8. }
![Page 13: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/13.jpg)
©SIProp Project, 2006-2008 13
About OpenCL for OpenCV 2/2
Customized OpenCL for OpenCV
Headeropencv2/ocl/ocl.hpp
modules/core/src/opencl/runtime/generator/
Source Code
modules/core/src/opencl/*.cl
OpenCL feature
NOT Binary Compatibility
![Page 14: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/14.jpg)
©SIProp Project, 2006-2008 14
Problem on Android
Required put *.CL files on same place of App
But Android App is APK file. Not single binary.-> MUST build OpenCV.so w/your CL file
“OpenCV with OpenCL for Android NDK”
How to Build OpenCV for Android Systemhttps://github.com/noritsuna/OpenCVwithOpenCL4AndroidNDK
![Page 15: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/15.jpg)
©SIProp Project, 2006-2008 15
OpenCV for FPGA
![Page 16: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/16.jpg)
©SIProp Project, 2006-2008 16
Zynq-7000 Series
Dual ARM Cortex-A9 + FPGA
![Page 17: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/17.jpg)
©SIProp Project, 2006-2008 17
Zynq + FPGA(IP Core)
Zynq + PWM IP Core
![Page 18: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/18.jpg)
©SIProp Project, 2006-2008 18
Zynq Development Board
ZedBoard
USD495, Zynq-7020
ZYBO
USD189, Zynq-7010
Checkpoint
With Vivado(IDE) License?http://www.digilentinc.com/
![Page 19: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/19.jpg)
©SIProp Project, 2006-2008 19
Advantage of ARM + FPGA
Full Functions of OpenCV
-> Required Super Large FPGA
OpneCV Functions
Good Fit for CPU Functions
Good Fit for FPGAFunctions
![Page 20: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/20.jpg)
©SIProp Project, 2006-2008 20
OpenCV Sample App for Zynq
“Accelerating OpenCV Applications with Zynq-7000 All Programmable SoC using Vivado HLS Video Libraries”
http://www.xilinx.com/support/documentation/application_notes/xapp1167.pdf
Development Tools
Vivado HLS(High-Level Synthesizer)Developing Environment for HLS
VivadoIP Designer for Xilinx FPGA
ISEExpired
![Page 21: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/21.jpg)
©SIProp Project, 2006-2008 21
OpenCV is included in Vivado HSL
![Page 22: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/22.jpg)
©SIProp Project, 2006-2008 22
Not Found…
![Page 23: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/23.jpg)
©SIProp Project, 2006-2008 23
I guess “hls_opencv.h” is OpenCV for HLS.
Can’t Use…
![Page 24: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/24.jpg)
©SIProp Project, 2006-2008 24
Not Synthesizable…
![Page 25: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/25.jpg)
©SIProp Project, 2006-2008 25
“hls_video.h” has OpenCV for HLS
![Page 26: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/26.jpg)
©SIProp Project, 2006-2008 26
OpenCV Sample App for Zynq
“Accelerating OpenCV Applications with Zynq-7000 All Programmable SoC using Vivado HLS Video Libraries”
http://www.xilinx.com/support/documentation/application_notes/xapp1167.pdf
Detail
Dilate Filter with FAST feature point for FullHDVideo(1920x1080) Streaming
![Page 27: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/27.jpg)
©SIProp Project, 2006-2008 27
1. void image_filter(AXI_STREAM& input, AXI_STREAM& output, int rows, int cols)
2. {
3. //Create AXI streaming interfaces for the core
4. #pragma HLS RESOURCE variable=input core=AXIS metadata="-bus_bundle
INPUT_STREAM"
5. #pragma HLS RESOURCE variable=output core=AXIS metadata="-bus_bundle
OUTPUT_STREAM"
6. #pragma HLS interface ap_stable port=rows
7. #pragma HLS interface ap_stable port=cols
8. hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC3> _src(rows,cols);
9. hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC3> _dst(rows,cols);
10. #pragma HLS dataflow
11. hls::AXIvideo2Mat(input, _src);
12. hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC3> src0(rows,cols);
13. hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC3> src1(rows,cols);
14. #pragma HLS stream depth=20000 variable=src1.data_stream
15. hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC1> mask(rows,cols);
16. hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC1> dmask(rows,cols);
17. hls::Scalar<3,unsigned char> color(255,0,0);
18. hls::Duplicate(_src,src0,src1);
19. hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC1> gray(rows,cols);
20. hls::CvtColor<HLS_BGR2GRAY>(src0,gray);
21. hls::FASTX(gray,mask,20,true);
22. hls::Dilate(mask,dmask);
23. hls::PaintMask(src1,dmask,_dst,color);
24. hls::Mat2AXIvideo(_dst, output);
25. }
![Page 28: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/28.jpg)
©SIProp Project, 2006-2008 28
FAST() function
FAST (Features from Accelerated Segment Test) algorithm
One of the features detection algorithm
![Page 29: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/29.jpg)
©SIProp Project, 2006-2008 29
Dilate() function
Dilating pixels function
One of the filter function
![Page 30: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/30.jpg)
©SIProp Project, 2006-2008 30
How to Implement ARM + FPGA?
Full Functions of OpenCV
-> Required Super Large FPGA
OpneCV Functions
Good Fit for CPU Functions
Good Fit for FPGAFunctions
![Page 31: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/31.jpg)
©SIProp Project, 2006-2008 31
Generated “IP Core + Header”=“Driver”
![Page 32: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/32.jpg)
©SIProp Project, 2006-2008 32
Zynq + FPGA(IP Core)
Zynq + PWM IP Core on Vivado(≠Vivado HSL)
![Page 33: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/33.jpg)
©SIProp Project, 2006-2008 33
1. void image_filter(AXI_STREAM& input, AXI_STREAM& output, int rows, int cols)
2. {
3. //Create AXI streaming interfaces for the core
4. #pragma HLS RESOURCE variable=input core=AXIS metadata="-bus_bundle
INPUT_STREAM"
5. #pragma HLS RESOURCE variable=output core=AXIS metadata="-bus_bundle
OUTPUT_STREAM"
6. #pragma HLS interface ap_stable port=rows
7. #pragma HLS interface ap_stable port=cols
8. hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC3> _src(rows,cols);
9. hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC3> _dst(rows,cols);
10. #pragma HLS dataflow
11. hls::AXIvideo2Mat(input, _src);
12. hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC3> src0(rows,cols);
13. hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC3> src1(rows,cols);
14. #pragma HLS stream depth=20000 variable=src1.data_stream
15. hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC1> mask(rows,cols);
16. hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC1> dmask(rows,cols);
17. hls::Scalar<3,unsigned char> color(255,0,0);
18. hls::Duplicate(_src,src0,src1);
19. hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC1> gray(rows,cols);
20. hls::CvtColor<HLS_BGR2GRAY>(src0,gray);
21. hls::FASTX(gray,mask,20,true);
22. hls::Dilate(mask,dmask);
23. hls::PaintMask(src1,dmask,_dst,color);
24. hls::Mat2AXIvideo(_dst, output);
25. }
![Page 34: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/34.jpg)
©SIProp Project, 2006-2008 34
How to USE “IP Core & Headers”
Build Binaries
FSBL(1st Boot Loader) = x-loader, u-boot
Linux Kernel for Your System
Ubuntu/Android for System (Option)
How to Build Android 5.0 for ZedBoard
http://www.slideshare.net/noritsuna/zedroid-android-50-and-later-on-zedboard
![Page 35: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/35.jpg)
©SIProp Project, 2006-2008 35
1. void image_filter(AXI_STREAM& input, AXI_STREAM& output, int rows, int cols)
2. {
3. //Create AXI streaming interfaces for the core
4. #pragma HLS RESOURCE variable=input core=AXIS metadata="-bus_bundle
INPUT_STREAM"
5. #pragma HLS RESOURCE variable=output core=AXIS metadata="-bus_bundle
OUTPUT_STREAM"
6. #pragma HLS interface ap_stable port=rows
7. #pragma HLS interface ap_stable port=cols
8. hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC3> _src(rows,cols);
9. hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC3> _dst(rows,cols);
10. #pragma HLS dataflow
11. hls::AXIvideo2Mat(input, _src);
12. hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC3> src0(rows,cols);
13. hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC3> src1(rows,cols);
14. #pragma HLS stream depth=20000 variable=src1.data_stream
15. hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC1> mask(rows,cols);
16. hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC1> dmask(rows,cols);
17. hls::Scalar<3,unsigned char> color(255,0,0);
18. hls::Duplicate(_src,src0,src1);
19. hls::Mat<MAX_HEIGHT,MAX_WIDTH,HLS_8UC1> gray(rows,cols);
20. hls::CvtColor<HLS_BGR2GRAY>(src0,gray);
21. hls::FASTX(gray,mask,20,true);
22. hls::Dilate(mask,dmask);
23. hls::PaintMask(src1,dmask,_dst,color);
24. hls::Mat2AXIvideo(_dst, output);
25. }
![Page 36: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/36.jpg)
©SIProp Project, 2006-2008 36
Can’t Use All Standard C/C++ Libs
![Page 37: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/37.jpg)
©SIProp Project, 2006-2008 37
OpenCV acceleration battle:
OpenCL on Firefly-RK3288(MALI-T764) vs.
FPGA on ZedBoard(Zynq-7020)
Noritsuna Imamura
![Page 38: OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on ZedBoard(Zynq-7020)](https://reader034.fdocument.pub/reader034/viewer/2022042509/55a628a91a28abda138b4613/html5/thumbnails/38.jpg)
©SIProp Project, 2006-2008 38
Which way is faster?
A. OpenCV for FPGA >>>> OpenCV for OpenCL
According to Xilinx “1000 times over faster”.
Clock SpeedFPGA = 150MHz, ARM-T764 = 650MHz
Direct connect Peripherals to FPGA.
GPGPU must bypass CPU/Memory bus.