Download - [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

Transcript

Page 1: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

map-DGDWD�UHÀQHG

www.map-d.com @datarefined Todd Mostak [email protected] Ι Ι

1830 Sansome St. San Francisco, CA 94104

#mapd @datarefined

Page 2: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

map-D? super-fast database built into GPU memory

Do? world’s fastest real-time big data analytics interactive visualization

Demo? twitter analytics platform 1billion+ tweets milliseconds

Page 3: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

The importance of interactivity

People have struggled for a long time to build interactive visualizations of big data that can deliver insight

•  Hypothesis testing can occur at “speed of thought”

Interactivity means:

How Interactive is interactive enough?

•  According to a study by Jeffrey Heer and Zhicheng Liu, “an injected delay of half a second per operation adversely affects user performance in exploratory data analysis.”

•  Some types of latency are more detrimental than others:

•  For example, linking and brushing more sensitive than zooming

Page 4: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

Strategies for interactivity

•  Sampling:

•  Ex. BlinkDB

•  Issues:

•  Need statistically robust method for sampling

•  Sampling can miss “long-tail” phenomena

•  Pre-computation

•  Ex. ImMems (datacubing)

•  Issues:

•  Only can show what curator thought was relevant

•  Can only store a certain number of binned attributes

•  Must be curated!

•  At the same time, Map-D also rendered HD data visualizations and sent them to Tweetmap’s interactive analytics GUI

Live demo: www.mapd.csail.mit.edu SC13 video and write up:

Page 5: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

The Arrival of In-Memory Systems

•  Traditional RDBMS used to be too slow to serve as a back-end for interactive visualizations.

•  Queries of over a billion records could take minutes if not hours

•  But in-memory systems can execute such queries in a fraction of the time.

•  Both full DBMS and “pseudo”-DBMS solutions

•  But still often too slow

Page 6: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

Enter Map-D

Page 7: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

the technology

Page 8: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

Core Innovation

SQL-enabled column store database built into the memory architecture on GPUs and CPUs

•  Memory and computational bandwidth of multiple GPUs •  Heterogeneous architectures (CPUs and GPUs)

•  Fast RDMA between GPUs on different nodes •  GPU Graphics pipeline

System can scan data at > 2TB/sec per node, with > 10TB/sec per node logical throughput with shared scans

Code developed from scratch to take advantage of:

Double-level buffer pool across GPU and CPU memory

Shared scans – multiple queries of the same data can share memory bandwidth

Page 9: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

The Hardware

CPU 0 CPU 1

RAID Controller

GPU 0

S2 S3 S4

GPU 3 GPU 2 GPU 1

IB IB

QPI PCI PCI

CPU 0 CPU 1

RAID Controller

GPU 0

S2 S3 S4

GPU 3 GPU 2 GPU 1

IB IB

QPI PCI PCI

Switch

Node 0 Node 1

Page 10: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

The Two-‐Level Buffer Pool

GPU Memory

CPU Memory

SSD

Page 11: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

Multiple GPUs, with data partitioned between them

Node 1 Node 2 Node 3

Filter text ILIKE ‘rain’!

Shared Nothing Processing

Page 12: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

the product

Page 13: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

Complex AnalyKcs

GPU in-‐memory SQL database

VisualizaKon

Image processing OpenGL

H.264/VP8 streaming GPU pipeline

Machine learning Graph analyKcs

Scale to cluster of GPU nodes SQL compiler Shared scans User defined funcKons Hybrid GPU/CPU execuKon OpenCL and CUDA

License

Simple # of GPUs

Mobile/server versions

Product GPU powered end-‐to-‐end big data analyKcs and visualizaKon plaYorm

Page 14: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

p-D

Single GPU

12GB memory

Map-D code integrated into GPU memory

Single CPU

768GB memory

Map-D code integrated into CPU memory

NVIDIA TEGRA Mobile chip

4GB memory

Map-D code integrated into chip memory

8 cards = 4U box

4 sockets = 4U box

Map-D code runs on GPU + CPU memory

36U rack: ~400GB GPU ~12TB CPU

Mobile Map-D running small datasets

Native App

Web-based service

Map-D hardware architecture

Large Data Big Data

Small Data

Next Gen Flash 40TB

100GB/s

map-D

www.map-d.com

@datarefined

[email protected]

Top Related

Parallel computing and GPU introduction · 2013-12-17 · Parallel computing and GPU introduction . 黃子桓 ‹tzhuan@gmail.com› Agenda • Parallel computing • GPU introduction

Parallel computing and GPU introduction · 2013-12-17 · Parallel computing and GPU introduction . 黃子桓 ‹[email protected]› Agenda • Parallel computing • GPU introduction

введение в Gpu

ASUS GPU Tweak II User Manual...4 English Launching GPU Tweak II • Double-click the GPU Tweak icon on the desktop. • Click Start > All Programs > ASUS > GPU Tweak II to launch

GPU PROFILING 기법을통한 DEEP LEARNING 성능 최적화기법소개 · 2019-07-22 · Interactive CUDA API debugging and kernel profiling Fast Data Collection Graphical and multiple

ROSSA RAKENNE - istra-dolina.ruistra-dolina.ru/upload/iblock/2c5/2c51b85847e5ce870b03373a0de7a… · По дписьм ен жера___ _____ __ Подпись заказчика_____

la revue d’art gpu - Brian Mura / Toolsbrianmura.fr/img/gpu/courtesy-dp.pdfla revue d’art gpu Fondée en 2006 GPU édite et imprime près de 170 auteurs et artistes. Selon sa propre

ScilabTec Gpu

024 Sequencia SolidosGeometricos FigurasPlano TP 2c5