B17 Eliminating the database bottleneck

22
Eliminating the Database Bottleneck What makes Vectorwise so fast Mark Van de Wiel Thursday, November 01, 2012 Director Product Management, Vectorwise 1 of 9 1 of 9 1 of 9 1 of 9 1 of 9 Confidential © 2012 Actian Corporation

Transcript of B17 Eliminating the database bottleneck

Page 1: B17 Eliminating the database bottleneck

Eliminating the Database Bottleneck What makes Vectorwise so fast

Mark Van de Wiel

Thursday, November 01, 2012

Director Product Management, Vectorwise

1 of 9 1 of 9 1 of 9 1 of 9 1 of 9 Confidential © 2012 Actian Corporation

プレゼンター
プレゼンテーションのノート
Title Slide
Page 2: B17 Eliminating the database bottleneck

Agenda

2

Why traditional RDBMSs are slow for analytics

Why Vectorwise is fast

The I/O challenge

Efficient updates

Confidential © 2012 Actian Corporation

Page 3: B17 Eliminating the database bottleneck

100x (+) Performance Difference – 2003 Custom C versus Relational Database

Confidential © 2012 Actian Corporation 3

26.2 28.1

0.2 0.6 0

5

10

15

20

25

30

MySQL DBMS 'X' C program Vectorwise

TPC-H 1 GB query 1 (runtime in s)

MySQL DBMS 'X' C program Vectorwise

Page 4: B17 Eliminating the database bottleneck

Traditional Relational Database for Analytics Inefficiencies

Confidential © 2012 Actian Corporation 4

Inefficient storage

Inefficient processing

Page 5: B17 Eliminating the database bottleneck

Inefficient Storage for Analytics

Confidential © 2012 Actian Corporation 5

Row-based storage model Predominant in 2003, still very common today

Works well for OLTP

Joe 101 27 Black

Edward 103 21 Scissorhand

Page 6: B17 Eliminating the database bottleneck

Inefficient Storage – Row-based

Confidential © 2012 Actian Corporation 6

Pages on disk – example

27 Black Joe

21 Scissorhand Edward

101

103

pointers to tuples

Var-width attribute pointers

Page 7: B17 Eliminating the database bottleneck

Issues with Row-based Storage

Confidential © 2012 Actian Corporation 7

Always read all attributes Poor bandwidth

Poor use of memory buffer

Complex row structure and navigation E.g. compressing out null fields

E.g. row chaining

Page 8: B17 Eliminating the database bottleneck

Efficient Storage for Analytics

Confidential © 2012 Actian Corporation 8

Columnar storage: store attributes separtely

Retrieve only attributes required by the query

Used by “traditional” column stores, e.g. Sybase IQ, Vertica

Page 9: B17 Eliminating the database bottleneck

Inefficient Processing

Confidential © 2012 Actian Corporation 9

How a traditional database runs a query

Query: SELECT name, salary*.19 AS tax FROM employee WHERE age > 25

Page 10: B17 Eliminating the database bottleneck

Inefficient Processing

Confidential © 2012 Actian Corporation 10

How a traditional database runs a query

Tuple-at-a-time iterator interface: - open() - next(): tuple - close() next() is called: - for each operator - for each tuple Complex code repeated over and over

Page 11: B17 Eliminating the database bottleneck

Inefficient Processing

Confidential © 2012 Actian Corporation 11

How a traditional database runs a query

Data-specific computational functionality Called once for every operation on every tuple Worse for complex tuple representations

Page 12: B17 Eliminating the database bottleneck

Inefficient Processing (Part 1 of 2)

Confidential © 2012 Actian Corporation 12

Lots of repeated, unnecessary code Operator logic

Function calls

Attribute access

Most instructions interpreting a query

Very few instructions processing actual data!

Many instructions per tuple

Page 13: B17 Eliminating the database bottleneck

CPU Features – Inefficient Processing Part 2

Confidential © 2012 Actian Corporation 13

In the last 20 years… Chip cache because RAM access is too slow and congested

Branch-sensitive CPU pipelines

Superscalar features

SIMD instructions (SSE and AVX)

Great for multimedia processing, scientific computing…

… but NOT for traditional relational databases Complex code: function calls, branches

Poor use of CPU cache (both data and instructions)

Processing one value at a time

Page 14: B17 Eliminating the database bottleneck

Inefficient Processing

Confidential © 2012 Actian Corporation 14

Traditional RDBMS Many instructions per tuple

Many cycles per instruction

Very many cycles per tuple

Page 15: B17 Eliminating the database bottleneck

Vectorwise – Vector-based Processing

Confidential © 2012 Actian Corporation 15

Query: SELECT name, salary*.19 AS tax FROM employee WHERE age > 25

Page 16: B17 Eliminating the database bottleneck

Vectorwise – Vector-based Processing

Confidential © 2012 Actian Corporation 16

Vector contains data of multiple tuples (1024) All operations consume and produce entire vectors Effect: much less operator.next() and primitive calls. AND: pipelined query evaluation

Page 17: B17 Eliminating the database bottleneck

Why is Vectorwise so Fast?

Confidential © 2012 Actian Corporation 17

Reduced interpretation overhead 100+ times fewer function calls

Good CPU cache use High locality in primitives

Cache-conscious algorithms

No tuple navigation Primitives only see arrays

Vectorization allows algorithmic optimization CPU and compiler-friendly function bodies

Multiple work units, loop-pipelining, SIMD…

BONUS: PARALLEL QUERY

Page 18: B17 Eliminating the database bottleneck

Some Numbers

Confidential © 2012 Actian Corporation 18

Traditional RDBMS: <200 MB/s per core

Vectorwise (lab environment): >1.5 GB/s per core

Page 19: B17 Eliminating the database bottleneck

Addressing the I/O Challenge

Confidential © 2012 Actian Corporation 19

Columnar storage

Smart column buffer (memory)

Data compression On disk: less I/O

In memory: best use of column buffer

Ultra-efficient decompression algorithms to get sufficient throughput

Large contiguous data blocks for optimum disk I/O

In-memory min-max indexes per block (i.e. per column) Eliminate data blocks based on implicit/explicit filter criteria

Page 20: B17 Eliminating the database bottleneck

Efficient Updates in a Column Store

Confidential © 2012 Actian Corporation 20

Positional Delta Trees (PDTs) In-memory representation of small data changes

Efficiently merged with on-disk data

Periodically propagated to disk

Provide snapshot read consistency

ACID compliant

Page 21: B17 Eliminating the database bottleneck

Agenda

21

Why traditional RDBMSs are slow for analytics

Why Vectorwise is fast

The I/O challenge

Efficient updates

Confidential © 2012 Actian Corporation

Page 22: B17 Eliminating the database bottleneck

Confidential © 2012 Actian Corporation