Basics of Database Tuning

32
OLAP Indexes

Transcript of Basics of Database Tuning

Page 1: Basics of Database Tuning

OLAP Indexes

Page 2: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 2

What Is a Data Warehouse?

• “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.”

– W. H. Inmon• Data warehousing: the process of

constructing and using data warehouses

Page 3: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 3

Index Requirements in OLAP

• Data is read only– No insertion or deletion

• Query types– Point query: look up one specific tuple (rare)– Range query: return the aggregate of a (large)

set of tuples, with group by– Complex queries: need specific algorithms and

index structures, will be discussed later

Page 4: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 4

OLAP Query Example

• In table (cust, gender, …), find the total number of male customers

• Method 1: scan the table once• Method 2: build a B+ tree index on attribute

gender, still need to access all tuples of male customers

• Can we get the count without scanning many tuples?

Page 5: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 5

Bitmap Index

• For n tuples, a bitmap index has n bits and can be packed into ⎡n /8⎤ bytes and ⎡n /32⎤words

• From a bit to the row-id: the j-th bit of the p-th byte row-id = p*8 +j cust gender …

Jack M …Cathy F …

… … …Nancy F …

1 0 … 0

Page 6: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 6

Using Bitmap to Count

• Shcount[] contains the number of bits in the entry subscript– shcount[01100101]=4count = 0;for (i = 0; i < SHNUM; i++)

count += shcount[B[i]];

Page 7: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 7

Advantages of Bitmap Index

• Efficient in space• Ready for logic composition

– C = C1 AND C2– Bitmap operations can be used

• Bitmap index only works for categorical data with low cardinality– Naively, we need 50 bits per entry to represent

the state of a customer in US– How to represent a sale in dollars?

Page 8: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 8

Bit-Sliced Index

• A sale amount can be written as an integer number of pennies, and then represented as a binary number of N bits– 24 bits is good for up to $167,772.15,

appropriate for many stores• A bit-sliced index is N bitmaps

– Tuple j sets in bitmap k if the k-th bit in its binary representation is on

– The space costs of bit-sliced index is the same as storing the data directly

Page 9: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 9

Using Indexes

SELECT SUM(sales) FROM Sales WHERE C;– Tuples satisfying C is identified by a bitmap B

• Direct access to rows to calculate SUM: scan the whole table once

• B+ tree: find the tuples from the tree• Projection index: only scan attribute sales• Bit-sliced index: get the sum from ∑(B AND

Bk)*2k

Page 10: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 10

Cost Comparison

• Traditional value-list index (B+ tree) is costly in both I/O and CPU time– Not good for OLAP

• Bit-sliced index is efficient in I/O• Other case studies in [O’Neil and Quass,

SIGMOD’97]

Page 11: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 11

Horizontal or Vertical Storage

• A fact table for data warehousing is often fat– Tens of even hundreds of dimensions/attributes

• A query is often about only a few attributes• Horizontal storage: tuples are stored one by one• Vertical storage: tuples are stored by attributes

A1 A2 … A100

x1 x2 … x100

… … … …z1 z2 … z100

A1 A2 … A100

x1 x2 … x100

… … … …z1 z2 … z100

Page 12: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 12

Horizontal Versus Vertical

• Find the information of tuple t– Typical in OLTP– Horizontal storage: get the whole tuple in one search– Vertical storage: search 100 lists

• Find SUM(a100) GROUP BY {a22, a83}– Typical in OLAP– Horizontal storage (no index): search all tuples O(100n),

where n is the number of tuples– Vertical storage: search 3 lists O(3n), 3% of the

horizontal storage method• Projection index: vertical storage

Page 13: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 13

Outline

• OLAP and data warehousing: concepts• Indexing for OLAP operations• Data cube computation

– Computing complete data cubes– Compression

Page 14: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 14

MOLAPDate

Prod

uct

Cou

ntrysum

sumTV

VCRPC

1Qtr 2Qtr 3Qtr 4QtrU.S.A

Canada

Mexico

sum

Page 15: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 15

Pros and Cons

• Easy to implement• Fast retrieval• Many entries may be empty if data is sparse• Space costly

Page 16: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 16

ROLAP – Data Cube in Table

• A multi-dimensional databaseBase table

Dimensions Measure

S1 P1 Spring 6S1 P2 Spring 12S2 P1 Fall 9

Store Product Season AVG(Sales)

S1 * Spring 9… … … …* * * 9

9FallP1S212SpringP2S16SpringP1S1

MeasureDimensionsSalesSeasonProductStore

Cubing

Page 17: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 17

Pros and Cons

• Compact in space• Using mature relational technology• Overhead in query answering

Page 18: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 18

HOLAP

• Hybrid OLAP• Store detail data in ROLAP

– Save space• Store aggregates in MOLAP

– Fast query answering

Page 19: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 19

Cube Computation

• Given a base table B(D1, …, Dn, M), compute the set of all aggregates in the data cube using aggregate function f()

• D1, …, Dn are n dimensions and M is a measure

Page 20: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 20

Aggregate Cells

• A cell c=(c1, …, cn) is called an aggregate cell, provided each ci is either in Di or ALL– Generally, we assume that each tuple in B has

distinct dimension values, i.e., (D1, …, Dn) is a key in B

– Value ALL is also conventionally written as *• Cell c is a base cell if (c, m) is a tuple in the

base table• A cell c is not empty if there are some base

cells match all non-ALL attribute values in c

Page 21: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 21

Example

• (S1, ALL, Spring), (S2, P1, Fall) and (S1, ALL, Winter) are aggregate cells

• (S2, P1, Fall) is a base cell• (S1, ALL, Winter) is empty• (S1, ALL, Spring) = (S1, *, Spring)

Base tableDimensions Measure

S1 P1 Spring 6S1 P2 Spring 12S2 P1 Fall 9

Store Product Season Sales

Page 22: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 22

Cuboid Lattice

• All aggregate cells belong to the same group by form a cuboid

(*,*,*)

(Store,*,*) (*,Product,*) (*,*,Season)

(Store,Product,*) (Store,*,Season) (*,Product,Season)

(Store,Product,Season)

Drill down

Roll up

Page 23: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 23

Tuples in the base table

(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9

(S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9

(S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9

(*,*,*):9

Cube (Cell) Lattice

Page 24: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 24

A Naïve Approach

• Set one counter for every aggregate cell• Scan the base table once, compute all

aggregate cells• Time complexity O(n), where n is the

number of tuples in the base table• Space overhead O(π(|Di|+1))

– 10 dimensions, each dimension has cardinality 10, (1110 – 1010) = 15,937,424,601 counters are needed!

Page 25: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 25

Real Data Sets Are Often Sparse

• Many aggregate cells are empty• Example: 10 dimensions, each dimension

has cardinality 10– 1010 possible base cells– Even a base table has 1 billion tuples, still at

least 90% of the possible base cells are empty!

Page 26: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 26

Multiway Aggregate Computation

• Compute ABC• Compute AB, AC, and

BC from ABC• Compute A, B, and C

from AB, AC, and BC• Compute ALL from A,

B, C

ABC

A BC

AB AC BC

ALL

Page 27: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 27

Main Memory Allocation

• Do we really need to allocate main memory for AB, AC, and BC, respectively?– The cuboids still can be very large

• Observation: A, B, and C can be computed from any two of AB, AC, BC

ABC

A B

AC BC

C

AB

ALL

Page 28: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 28

Multi-way Array Aggregation (1)

A

B

29 30 31 32

1 2 3 4

5

9

13 14 15 16

6463626148474645

a1a0

c3c2

c1c 0

b3

b2

b1

b0a2 a3

C

4428 56

4024 52

3620

60

B

Page 29: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 29

Multi-way Array Aggregation (2)

A

B

29 30 31 32

1 2 3 4

5

9

13 14 15 16

6463626148474645

a1a0

c3c2

c1c 0

b3

b2

b1

b0a2 a3

C

4428 56

4024 52

3620

60

B

Page 30: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 30

Ideas

• Consider a base table (A, B, C), where |A|=10, |B|=100, |C|=1000 Order A-B-C C-B-A

(A,B,C) 1 1(A,B,*) 1 10*100(A,*,C) 1000 10(*,B,C) 100*1000 1(A,*,*) 1 10(*,B,*) 100 100(*,*,C) 1000 1(*,*,*) 1 1Total 102,104 1,124

ABC

A B C

AB AC BC

ALL

Page 31: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 31

The Array-based Approach

• Sort dimensions smartly– Keep the smaller aggregate planes in main

memory– Output aggregates in larger planes once they

are computed, and then reuse the counters• How to make it work when the main memory

usage is still too large?– Details in [Zhao, Deshpande and Naughton,

SIGMOD 97]

Page 32: Basics of Database Tuning

Jian Pei: Data Mining -- OLAP Indexes 32

Summary

• Indexes for OLAP– Bitmap index– Vertical storage

• Data cube computation – the multiway array approach

• Can you find some situations where bitmap index and vertical storage are useful?