B17 Eliminating the database bottleneck

Eliminating the Database Bottleneck What makes Vectorwise so fast

Mark Van de Wiel

Thursday, November 01, 2012

Director Product Management, Vectorwise

Agenda

Why traditional RDBMSs are slow for analytics

Why Vectorwise is fast

The I/O challenge

Efficient updates

100x (+) Performance Difference – 2003 Custom C versus Relational Database

26.2 28.1

0.2 0.6 0

MySQL DBMS 'X' C program Vectorwise

TPC-H 1 GB query 1 (runtime in s)

MySQL DBMS 'X' C program Vectorwise

Traditional Relational Database for Analytics Inefficiencies

Inefficient storage

Inefficient processing

Inefficient Storage for Analytics

Row-based storage model Predominant in 2003, still very common today

Works well for OLTP

Joe 101 27 Black

Edward 103 21 Scissorhand

Inefficient Storage – Row-based

Pages on disk – example

27 Black Joe

21 Scissorhand Edward

pointers to tuples

Var-width attribute pointers

Issues with Row-based Storage

Always read all attributes Poor bandwidth

Poor use of memory buffer

Complex row structure and navigation E.g. compressing out null fields

E.g. row chaining

Efficient Storage for Analytics

Columnar storage: store attributes separtely

Retrieve only attributes required by the query

Used by “traditional” column stores, e.g. Sybase IQ, Vertica

Inefficient Processing

How a traditional database runs a query

Query: SELECT name, salary*.19 AS tax FROM employee WHERE age > 25

Tuple-at-a-time iterator interface: - open() - next(): tuple - close() next() is called: - for each operator - for each tuple Complex code repeated over and over

Data-specific computational functionality Called once for every operation on every tuple Worse for complex tuple representations

Inefficient Processing (Part 1 of 2)

Lots of repeated, unnecessary code Operator logic

Function calls

Attribute access

Most instructions interpreting a query

Very few instructions processing actual data!

Many instructions per tuple

CPU Features – Inefficient Processing Part 2

In the last 20 years… Chip cache because RAM access is too slow and congested

Branch-sensitive CPU pipelines

Superscalar features

SIMD instructions (SSE and AVX)

Great for multimedia processing, scientific computing…

… but NOT for traditional relational databases Complex code: function calls, branches

Poor use of CPU cache (both data and instructions)

Processing one value at a time

Traditional RDBMS Many instructions per tuple

Many cycles per instruction

Very many cycles per tuple

Vectorwise – Vector-based Processing

Query: SELECT name, salary*.19 AS tax FROM employee WHERE age > 25

Vectorwise – Vector-based Processing

Vector contains data of multiple tuples (1024) All operations consume and produce entire vectors Effect: much less operator.next() and primitive calls. AND: pipelined query evaluation

Why is Vectorwise so Fast?

Reduced interpretation overhead 100+ times fewer function calls

Good CPU cache use High locality in primitives

Cache-conscious algorithms

No tuple navigation Primitives only see arrays

Vectorization allows algorithmic optimization CPU and compiler-friendly function bodies

Multiple work units, loop-pipelining, SIMD…

BONUS: PARALLEL QUERY

Some Numbers

Traditional RDBMS: <200 MB/s per core

Vectorwise (lab environment): >1.5 GB/s per core

Addressing the I/O Challenge

Columnar storage

Smart column buffer (memory)

Data compression On disk: less I/O

In memory: best use of column buffer

Ultra-efficient decompression algorithms to get sufficient throughput

Large contiguous data blocks for optimum disk I/O

In-memory min-max indexes per block (i.e. per column) Eliminate data blocks based on implicit/explicit filter criteria

Efficient Updates in a Column Store

Positional Delta Trees (PDTs) In-memory representation of small data changes

Efficiently merged with on-disk data

Periodically propagated to disk

Provide snapshot read consistency

ACID compliant

Agenda

Why traditional RDBMSs are slow for analytics

Why Vectorwise is fast

The I/O challenge

Efficient updates

B17 Eliminating the database bottleneck

Technology

Transcript of B17 Eliminating the database bottleneck

p2 b17 Kel2 Pleno

Phy b17 2-2

Vodič za vitamin B17

Kuhalna plošča Zona e zierjes PKE6..B17., PKF6..B17., PKM6 ...

B17 soi mat

Find the bottleneck of your system

B17 PENCAHAYAAN

P4 B17 KEL 3

wrapup b17 sken3

Laporan m2 b17 Klp4

B17 zipaquira nemocon

Eliminating malaria in NICARAGUA

Innovation - das wahre Bottleneck?

B17 exercícios

034 Waffen Arsenal B17

La facilitacion Grupo B17

b17 Sentra

Datagov Bottleneck

USING EQUIST FOR BOTTLENECK ANALYSIS

MySQL Tuning For CPU Bottleneck