Cassandra and materialized views

Cassandra 2.2 and 3.0 new features

DuyHai DOAN Apache Cassandra Technical Evangelist

#VoxxedBerlin @doanduyhai

Datastax

•  Founded in April 2010

•  We contribute a lot to Apache Cassandra™

•  400+ customers (25 of the Fortune 100), 450+ employees

•  Headquarter in San Francisco Bay area

•  EU headquarter in London, offices in France and Germany

•  Datastax Enterprise = OSS Cassandra + extra features

Materialized Views (MV)•  Why ? •  Detailed Impl •  Gotchas

Why Materialized Views ?•  Relieve the pain of manual denormalization

CREATE TABLE user( id int PRIMARY KEY, country text, …);CREATE TABLE user_by_country( country text, id int, …, PRIMARY KEY(country, id));

CREATE TABLE user_by_country ( country text, id int, firstname text, lastname text, PRIMARY KEY(country, id));

Materialzed View In ActionCREATE MATERIALIZED VIEW user_by_country AS SELECT country, id, firstname, lastnameFROM user WHERE country IS NOT NULL AND id IS NOT NULLPRIMARY KEY(country, id)

Materialzed View SyntaxCREATE MATERIALIZED VIEW [IF NOT EXISTS] keyspace_name.view_name

AS SELECT column1, column2, ...

FROM keyspace_name.table_name

WHERE column1 IS NOT NULL AND column2 IS NOT NULL ...

PRIMARY KEY(column1, column2, ...)

Must select all primary key columns of base table

•  IS NOT NULL condition for now •  more complex conditions in future

•  at least all primary key columns of base table (ordering can be different)

•  maximum 1 column NOT pk from base table

Materialized Views Demo 7

Materialized View Impl

UPDATE user SET country=‘FR’

WHERE id=1

① •  send mutation to all replicas•  waiting for ack(s) with CL

WHERE id=1

② Acquire local lock on base table partition

WHERE id=1

③ Local read to fetch current values

SELECT * FROM user WHERE id=1

WHERE id=1

④ Create BatchLog with

•  DELETE FROM user_by_country WHERE country = ‘old_value’

•  INSERT INTO user_by_country(country, id, …) VALUES(‘FR’, 1, ...)

WHERE id=1

⑤ Execute async BatchLog

to paired view replica with CL = ONE

WHERE id=1

⑥ Apply base table updade locallySET COUNTRY=‘FR’

WHERE id=1

⑦ Release local lock

WHERE id=1

⑧ Return ack to coordinator

WHERE id=1

⑨ If CL ack(s)

received, ack client

MV Failure Cases: concurrent updates

Read base row (country=‘UK’)•  DELETE FROM mv WHERE

country=‘UK’•  INSERT INTO mv …(country)

VALUES(‘US’)•  Send async BatchLog •  Apply update country=‘US’

1) UPDATE … SET country=‘US’ 2) UPDATE … SET country=‘FR’

VALUES(‘FR’)•  Send async BatchLog •  Apply update country=‘FR’

Without local

INSERT INTO mv …(country) VALUES(‘US’)INSERT INTO mv …(country) VALUES(‘FR’)

Read base row (country=‘US’)•  DELETE FROM mv WHERE

country=‘US’•  INSERT INTO mv …(country)

With local lo

🔓 🔒

🔓 19

MV Failure Cases: failed updates to MV

WHERE id=1

⑤ Execute async BatchLog

to paired view replica with CL = ONE

MV replica d

MV Failure Cases: failed updates to MV

WHERE id=1

BatchLog replay

MV replica u

Materialized View Performance•  Write performance

•  local lock•  local read-before-write for MV à update contention on partition (most of perf hits) •  local batchlog for MV•  ☞ you only pay this price once whatever number of MV •  for each base table update: mv_count x 2 (DELETE + INSERT) extra mutations

Materialized View Performance•  Write performance vs manual denormalization

•  MV better because no client-server network traffic for read-before-write •  MV better because less network traffic for multiple views (client-side BATCH)

•  Makes developer life easier à priceless

Materialized View Performance•  Read performance vs secondary index

•  MV better because single node read (secondary index can hit many nodes)•  MV better because single read path (secondary index = read index + read data)

Materialized Views Consistency•  Consistency level

•  CL honoured for base table, ONE for MV + local batchlog

•  Weaker consistency guarantees for MV than for base table.

•  Exemple, write at QUORUM •  guarantee that QUORUM replicas of base tables have received write•  guarantee that QUORUM of MV replicas will eventually receive DELETE + INSERT

Materialized Views Gotchas•  Beware of hot spots !!!

•  MV user_by_gender 😱

User Define Functions (UDF)•  Why ? •  Detailed Impl •  UDAs •  Gotchas

Rationale•  Push computation server-side

•  save network bandwidth (1000 nodes!)•  simplify client-side code•  provide standard & useful function (sum, avg …)•  accelerate analytics use-case (pre-aggregation for Spark)

How to create an UDF ?

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALL ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnTypeLANGUAGE language AS $$ // source code here$$;

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnTypeLANGUAGE language AS $$ // source code here$$;

An UDF is keyspace-wide

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnTypeLANGUAGE languageAS $$ // source code here$$;

Param name to refer to in the code Type = CQL3 type

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnTypeLANGUAGE language // jAS $$ // source code here$$;

Always called Null-check mandatory in code

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT RETURN returnTypeLANGUAGE language // javAS $$ // source code here$$;

If any input is null, code block is skipped and return null

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnType LANGUAGE languageAS $$ // source code here$$;

CQL types •  primitives (boolean, int, …) •  collections (list, set, map) •  tuples •  UDT

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnTypeLANGUAGE languageAS $$ // source code here$$; JVM supported languages

•  Java, Scala •  Javascript (slow) •  Groovy, Jython, JRuby •  Clojure ( JSR 223 impl issue)

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURN returnTypeLANGUAGE languageAS $$ // source code here$$;

UDF Demo

UDA•  Real use-case for UDF

•  Aggregation server-side à huge network bandwidth saving

•  Provide similar behavior for Group By, Sum, Avg etc …

How to create an UDA ?

CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS][keyspace.]aggregateName(type1, type2, …)SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction]INITCOND initCond;

Only type, no param name

State type

Initial state type

Accumulator function. Signature: accumulatorFunction(stateType, type1, type2, …)

RETURNS stateType

Optional final function. Signature: finalFunction(stateType)

UDA return type ? If finalFunction •  return type of finalFunction Else •  return stateType

UDA Demo

Gotchas

UDA ①

② & ③

⑤ ② & ③

② & ③

Gotchas

UDA ①

② & ③

⑤ ② & ③

② & ③

Why do not apply UDF/UDA on replica node ?

Gotchas

UDA ①

② & ③

④ •  apply accumulatorFunction•  apply finalFunction

⑤ ② & ③

② & ③

1.  Because of eventual consistency

2.  UDF/UDA applied AFTER last-write-win logic

Gotchas

•  UDA in Cassandra is not distributed !

•  Execute UDA on a large number of rows (106 for ex.) •  single fat partition•  multiple partitions•  full table scan

•  à Increase client-side timeout•  default Java driver timeout = 12 secs•  JAVA-1033 JIRA for per-request timeout setting

Cassandra UDA or Apache Spark ?

Consistency Level

Single/Multiple Partition(s)

Recommended Approach

ONE Single partition UDA with token-aware driver because node local

ONE Multiple partitions Apache Spark because distributed reads

> ONE Single partition UDA because data-locality lost with Spark

> ONE Multiple partitions Apache Spark definitely

Cassandra UDA or Apache Spark ?

Consistency Level

Single/Multiple Partition(s)

Recommended Approach

ONE Single partition UDA with token-aware driver because node local

ONE Multiple partitions Apache Spark because distributed reads

> ONE Single partition UDA because data-locality lost with Spark

> ONE Multiple partitions Apache Spark definitely

New Storage Engine•  Data structure •  Disk space usage

Pre 3.0 data structure

Map<byte[ ], SortedMap<byte[ ], Cell>>

CREATE TABLE sensor_data( sensor_id uuid, date timestamp, sensor_type text, sensor_value double, PRIMARY KEY(sensor_id, date));

Pre 3.0 on disk layout

RowKey: de305d54-75b4-431b-adb2-eb6b9e546014 => (column=2015-04-27 10:00:00+0100:, value=, timestamp=1430128800) => (column=2015-04-27 10:00:00+0100:sensor_type, value=‘Temperature’, timestamp=1430128800) => (column=2015-04-27 10:00:00+0100:sensor_value, value=23.48, timestamp=1430128800) => (column=2015-04-27 10:01:00+0100:, value=, timestamp=1430128860) => (column=2015-04-27 10:01:00+0100:sensor_type, value=‘Temperature’, timestamp=1430128860) => (column=2015-04-27 10:01:00+0100:sensor_value, value=24.08, timestamp=1430128860)

Clustering values are repeated for each normal column

Full timestamp storage

Cassandra 3.0 data structure

Map<byte[ ], SortedMap<ClusteringColumn, Row>>

CREATE TABLE sensor_data( sensor_id uuid, date timestamp, sensor_type text, sensor_value double, PRIMARY KEY(sensor_id, date));

Cassandra 3.0 on disk layout

PartitionKey: de305d54-75b4-431b-adb2-eb6b9e546014 => clusteringColumn:2015-04-27 10:00:00+0100 => row_timestamp=1430128800 => (column_value=‘Temperature’, delta_encoded_timestamp=+0) => (column_value=23.48, delta_encoded_timestamp=+0) => clusteringColumn:2015-04-27 10:01:00+0100 => row_timestamp=1430128860 => (column_value=‘Temperature’, delta_encoded_timestamp=+0) => (column_value=24.08, delta_encoded_timestamp=+0)

Delta-encoded timestamp vs row timestamp

•  No clustering value repetition

•  Column labels are stored only once in meta data

•  Delta encoding of timestamp, 8 bytes saved each time

•  Less disk space used

Benchmarks

CREATE TABLE events ( id uuid, date timeuuid, prop1 int, prop2 text, prop3 float, PRIMARY KEY(id, date));

106 rows

Small string

Benchmarks

CREATE TABLE largetext( key int, prop1 int, prop2 text,PRIMARY KEY(id));

106 rows

Large string (1000)

Benchmarks

CREATE TABLE largeclustering( key int, clust text, prop1 int, prop2 set<float>,PRIMARY KEY(id, clust));

106 rows Medium string (100)

50 items

Benchmarks

CREATE TABLE events ( id uuid, date timeuuid, prop1 int, prop2 text, prop3 float, PRIMARY KEY(id, date)) WITH COMPACT STORAGE ;

@doanduyhai

duy_hai.doan@datastax.com

https://academy.datastax.com/

Thank You

Cassandra and materialized views

Software

Transcript of Cassandra and materialized views

Conhecendo Apache Cassandra Meetup - Conhecendo … · Eiti Kimura Coordenador de TI na Movile - Apache Cassandra MVP 2015 - Apache Cassandra MVP 2014 - Contribuidor Apache Cassandra

Cassandra meetup 20150331

Cassandra - Ottobre 2013

TH CHIFFRES-CLÉS / KEY FIGURES PRESENTS 10 000 000 · INDONESIA 230 K < 100K VIEWS ≥ 100-200K VIEWS ≥ 200-500K VIEWS > 500K VIEWS GEOGRAPHICAL BREAKDOWN OF FILM VIEWS.

Cassandra 简介

Cassandra Java Driver : vers Cassandra 1.2 et au-delà

Web Analytics BI Portal - Oracle에게있어서는매우효과적인자료 ... -요약테이블을위한상세보기(materialized view) ... y별도의Customizing 없이150여개의기본분석보고서활용

NoSQL Apache Cassandra

Cassandra cql

Apresentacao Cassandra

Lazy Maintenance of Materialized Views

Apache cassandra cosnola

Real-time Cassandra

PDF Cassandra

Cassandra - Marzo 2012

Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

Particionamento cassandra

INSTALACION CASSANDRA

Cassandra education material

Cassandra rep snbu_2006