Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerated Tools

19
DATA SCIENCE POP UP AUSTIN Making Data Science FAST: Survey of GPU Accelerated Tools Mazhar Memon CO-Founder and CEO, Bitfusion.io

Transcript of Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerated Tools

Page 1: Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerated Tools

DATA SCIENCEPOP UP

AUSTIN

Making Data Science FAST: Survey of GPU Accelerated Tools

Mazhar MemonCO-Founder and CEO, Bitfusion.io

Page 2: Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerated Tools

DATA SCIENCEPOP UP

AUSTIN

#datapopupaustin

April 13, 2016Galvanize, Austin Campus

Page 4: Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerated Tools

MAKING DATA SCIENCE FAST: SURVEY OF GPU ACCELERATED TOOLS

4

MAZHAR MEMON

CTO, BITFUSION. IO

Page 5: Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerated Tools

Overview

•OverviewofGPUs

•Drop-inLibraries

•ProgrammingFrameworks

•DeepLearning

•GraphDatabases

•Visualization

5

Page 6: Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerated Tools

6

abstractand

slow

à

ß complexand

fast

hardware

softwar

e

Timeà

Thebiggap:makingyourdatasciencefast!

The problem in computing

Page 7: Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerated Tools

Integrated GPUs

•Architecture:SIMD,sharedresourcearchitecture

•Targetedworkloads:Medium-sizedoffloads,latency-sensitive,cost-sensitive,media

•Programmingmodels:OpenCL,DirectCompute,C++AMP,SPIR,HSAIL

•Ecosystemmaturity:High

•Links:• https://software.intel.com/en-us/articles/intel-graphics-developers-guides

7

Page 8: Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerated Tools

Discrete GPUs

•Architecture:SIMD,discretecoprocessorconfiguration

•Targetedworkloads:Large-sizedoffloads,throughput-sensitive,parallelstructured

•Programmingmodels:CUDA,OpenCL,DirectCompute,C++AMP,SYCL,SPIR,HSA

•Ecosystemmaturity:High

•Links:• http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux

8

Page 9: Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerated Tools

Drop-in Libraries•Designedtobeeasytouse,noprogrammingrequired

•Others:AmgX,cuDNN,cuFFT,IndeX,nvGRAPH,GIE,NPP,FFMPEG,…• https://developer.nvidia.com/gpu-accelerated-libraries

9

Page 10: Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerated Tools

Programming for GPUs

C/C++▪ CUDA▪ OpenCL▪ DirectCompute▪ AMP▪ SYCL

Python◦ PyCUDA

Matlab

Page 11: Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerated Tools

Machine Learning

Caffe:http://caffe.berkeleyvision.org

Torch7:http://torch.ch

Cxxnet:https://github.com/dmlc/cxxnet

MXNet:https://github.com/dmlc/mxnet

MATLAB:http://www.mathworks.com/products/matlab/

TensorFlow:https://www.tensorflow.org

Mocha:https://github.com/pluskid/Mocha.jl

Page 12: Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerated Tools

Graph Databases: Pros and Cons

+ Fast statistics

+ Best compression

+ Easy to add new column

- Not good for fast inserts, streaming

- ETL required on import

Column store

+ No schema lock-in

+ Relationship queries fast

+ Rapid development

- Bad performance at scale historically, or limited query support

- No standard query language (exception of SPARQL)

Graph/ NoSQL

+ Rapid transactions

+ Very robust, mature

- Not easy to add or remove columns after created

- Every database has it’s own interpretation

Row store

Page 13: Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerated Tools

GPU experience so far

Success in Four different areas

Sqream:100xfasteronSQLqueries

BlazeGraph:1000xfasterongraphqueries

IBMDB2BLU:2-100xfasteronbusinessanalytics

GPUDB:Real-timequeriesonstreamingdata, naturalEnglishqueries

Page 14: Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerated Tools

BlazingDB -

BlazingDBisahighperformanceSQLdatabaseonvideographicscards(GPUs).WeleverageprocessorsfromthevideogameindustrytopowerBigDataAnalytics.Faster,abletohandlegreaterscale,allinsimpleSQL.

14

Page 15: Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerated Tools

Visualization

Page 16: Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerated Tools

Graphistry: Demo

16

Page 17: Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerated Tools

Questions?

17

Page 18: Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerated Tools

Backup

18

Page 19: Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerated Tools

DATA SCIENCEPOP UP

AUSTIN

@datapopup #datapopupaustin