Cassandra 2.2 & 3.0
-
Upload
victor-coustenoble -
Category
Software
-
view
1.190 -
download
0
Transcript of Cassandra 2.2 & 3.0
![Page 1: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/1.jpg)
Victor Coustenoble@vizanalytics
2.2 & 3.0
![Page 2: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/2.jpg)
http://www.datastax.com/dev/blog/cassandra-2-2
![Page 3: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/3.jpg)
Where did 2.2 come from?
![Page 4: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/4.jpg)
Don't start Thrift rpc by default (CASSANDRA-9319)
![Page 5: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/5.jpg)
New features• 2.2- JSON- User defined functions- User defined aggregates- Other useful features- http://docs.datastax.com/en/cassandra/2.2/cassandra/features.html
- http://www.datastax.com/dev/blog/cassandra-2-2
• 3.0- New storage engine (8099)- A new way to denormalise/duplicate : Materialized View
![Page 6: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/6.jpg)
So who’s taken some data out of C* and serialised it as JSON?
![Page 7: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/7.jpg)
Hello JSON• create TABLE user (username text primary key,
first_name text , last_name text , emails set<text> , country text);
• INSERT INTO user JSON '{"username": "chbatey", "first_name":"Christopher", "last_name": "Batey", “emails":["[email protected]"]}';
![Page 8: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/8.jpg)
Goodbye Serialisation!
![Page 9: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/9.jpg)
JSON + User Defined Types• CREATE TYPE movie (title text, time timestamp,
description text);• ALTER TABLE user ADD movies set<frozen<movie>>;• UPDATE user SET movies = {{ title:'Batman', time:'2011-02-03T04:05:00+0000', description: 'This film rocks' }} where username = 'chbatey';
![Page 10: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/10.jpg)
Out it comes
![Page 11: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/11.jpg)
• Run code on the server !Dangerous!- Disabled by default
• Java + Java Script supported out of the box
• Any language that supports the Java Scripting API (Java, Javascript, Ruby, Python …)
User Defined Functions
![Page 12: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/12.jpg)
UDF example
CREATE TABLE user ( username text primary key, first_name text , last_name text , emails set<text> , country text);
![Page 13: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/13.jpg)
Concat function
CREATE FUNCTION name ( first_name text, last_name text ) CALLED ON NULL INPUT RETURNS text LANGUAGE java AS ‘return first_name + " " + last_name;’;
cqlsh:twotwo> select name(first_name, last_name) FROM user;
twotwo.name(first_name, last_name)------------------------------------ Victor Coustenoble
![Page 14: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/14.jpg)
User Defined Aggregates
CREATE AGGREGATE average ( int ) SFUNC averageState STYPE tuple<int,bigint> FINALFUNC averageFinal INITCOND (0, 0);
Called for every row state passed between
Initial state
Return type (CQL)
Optional function called onfinal state
![Page 15: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/15.jpg)
State function (like a UDF)
CREATE FUNCTION averageState ( state tuple<int,bigint>, value int ) CALLED ON NULL INPUT RETURNS tuple<int,bigint> LANGUAGE java AS ' if (value != null) { state.setInt(0, state.getInt(0)+1); state.setLong(1, state.getLong(1)+val.intValue()); } return state; ';
Type Columns
![Page 16: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/16.jpg)
Final function
CREATE FUNCTION averageFinal ( state tuple<int,bigint> ) CALLED ON NULL INPUT RETURNS double LANGUAGE java AS ' if (state.getInt(0) == 0) return null; double r = state.getLong(1) / state.getInt(0); return Double.valueOf(r); ';
State typeOverall return type
![Page 17: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/17.jpg)
Putting it all together
![Page 18: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/18.jpg)
Customer events
CREATE AGGREGATE count_by_type(text) SFUNC countEventTypes STYPE map<text, int> INITCOND {};
CREATE FUNCTION countEventTypes( state map<text, int>, type text ) CALLED ON NULL INPUT RETURNS map<text, int> LANGUAGE java AS ' Integer count = (Integer) state.get(type); if (count == null) count = 1; else count = count + 1; state.put(type, count); return state; ';
![Page 19: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/19.jpg)
Customer events
![Page 20: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/20.jpg)
Built in aggregates• count• max• min• avg• sum
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/AggregateFcts.java
![Page 21: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/21.jpg)
Built in time functions
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/TimeFcts.java
![Page 22: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/22.jpg)
Built in aggregates in action
![Page 23: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/23.jpg)
1/ “Materialised views” with Spark
![Page 24: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/24.jpg)
2/ Pure C*
![Page 25: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/25.jpg)
2/ Pure C*
![Page 26: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/26.jpg)
JSON, UDF and UDA available in DevCenter
![Page 27: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/27.jpg)
Roles based Access
![Page 28: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/28.jpg)
Other bits and pieces…• Compressed commit log• Resumable bootstrapping• New types- smallint - short- tinyint - byte- date - time
• Warnings now sent back to client- batch too large
![Page 29: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/29.jpg)
http://www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-views
![Page 30: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/30.jpg)
New Storage Engine• CASSANDRA-8099• More efficient storage• Aware of CQL structure• Reduce sstable size• Reduce memory used• …
![Page 31: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/31.jpg)
Customer events tableCREATE TABLE if NOT EXISTS customer_events ( customer_id text, staff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (customer_id, time))
create INDEX on customer_events (staff_id) ;
![Page 32: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/32.jpg)
Indexes to the rescue?customer_id time staff_idchbatey 2015-03-03 08:52:45 trevorchbatey 2015-03-03 08:52:54 trevorchbatey 2015-03-03 08:53:11 billchbatey 2015-03-03 08:53:18 bill
rusty 2015-03-03 08:56:57 bill
rusty 2015-03-03 08:57:02 bill
rusty 2015-03-03 08:57:20 trevor
staff_id customer_idtrevor chbateytrevor chbateybill chbateybill chbateybill rustybill rustytrevor rusty
![Page 33: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/33.jpg)
Secondary index are local • The staff_id partition in the secondary index is not
distributed like a normal table• The secondary index entries are only stored on the node
that contains the customer_id partition
![Page 34: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/34.jpg)
Indexes to the rescue?
staff_id customer_idtrevor chbateytrevor chbateybill chbateybill chbatey
staff_id customer_idbill rusty
bill rusty
trevor rusty
A B
chbatey rusty
customer_id time staff_idchbatey 2015-03-03 08:52:45 trevorchbatey 2015-03-03 08:52:54 trevorchbatey 2015-03-03 08:53:11 billchbatey 2015-03-03 08:53:18 bill
rusty 2015-03-03 08:56:57 bill
rusty 2015-03-03 08:57:02 bill
rusty 2015-03-03 08:57:20 trevor
customer_events tablestaff_id customer_idtrevor chbateytrevor chbateybill chbateybill chbateybill rusty
bill rusty
trevor rusty
staff_id index
![Page 35: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/35.jpg)
Do it yourself index ?CREATE TABLE if NOT EXISTS customer_events ( customer_id text, staff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (customer_id, time))
CREATE TABLE if NOT EXISTS customer_events_by_staff ( customer_id text, staff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (staff_id, time))
![Page 36: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/36.jpg)
1.2 Logged batches
Cclient
C BATCH LOG
BL-R
BL-R
BL-R: Batch log replica
![Page 37: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/37.jpg)
Pattern• Write only:- Duplicate with a different primary key- (Optional) Logged batch for eventual consistency
• Full updates:- No real difference
• Partial updates:- No staff id in update?
![Page 38: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/38.jpg)
Score Data ModelCREATE TABLE scores( user TEXT, game TEXT, year INT, month INT, day INT, score INT, PRIMARY KEY (user, game, year, month, day))
![Page 39: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/39.jpg)
Materialized ViewsCREATE MATERIALIZED VIEW alltimehigh AS
SELECT user FROM scores WHERE game IS NOT NULL AND
score IS NOT NULL ANDuser IS NOT NULL ANDyear IS NOT NULL ANDmonth IS NOT NULL ANDday IS NOT NULL
PRIMARY KEY (game, score, user, year, month, day) WITH CLUSTERING ORDER BY (score desc)
![Page 40: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/40.jpg)
Materialized ViewsINSERT INTO scores (user, game, year, month, day, score) VALUES ('pcmanus', 'Coup', 2015, 05, 01, 4000)INSERT INTO scores (user, game, year, month, day, score) VALUES ('jbellis', 'Coup', 2015, 05, 03, 1750)INSERT INTO scores (user, game, year, month, day, score) VALUES ('yukim', 'Coup', 2015, 05, 03, 2250)INSERT INTO scores (user, game, year, month, day, score) VALUES ('tjake', 'Coup', 2015, 05, 03, 500)INSERT INTO scores (user, game, year, month, day, score) VALUES ('jmckenzie', 'Coup', 2015, 06, 01, 2000)INSERT INTO scores (user, game, year, month, day, score) VALUES ('iamaleksey', 'Coup', 2015, 06, 01, 2500)INSERT INTO scores (user, game, year, month, day, score) VALUES ('tjake', 'Coup', 2015, 06, 02, 1000)INSERT INTO scores (user, game, year, month, day, score) VALUES ('pcmanus', 'Coup', 2015, 06, 02, 2000)
SELECT user, score FROM alltimehigh WHERE game = 'Coup'user | score-----------+------- pcmanus | 4000iamaleksey | 2500 yukim | 2250 jmckenzie | 2000 pcmanus | 2000 jbellis | 1750 tjake | 1000 tjake | 500
![Page 41: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/41.jpg)
KillrWeather data model
![Page 42: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/42.jpg)
Combining aggregates + MVs
![Page 43: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/43.jpg)
How it works…
http://www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-viewshttps://issues.apache.org/jira/browse/CASSANDRA-6477
For more details
![Page 44: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/44.jpg)
Fine print
• All Primary Key columns must be present in your view• If the part of your primary key is NULL then it won't
appear in the materialised view• Performance will be a factor!- More operations to complete (read-before-write,
consistency check …)- Batch writes for MV
• Bad for low cardinality data (hot spot)
![Page 45: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/45.jpg)
Conclusions
• We still denormalise and duplicate to achieve scalability and performance
• We just let C* do it for us :)
![Page 46: Cassandra 2.2 & 3.0](https://reader035.fdocument.pub/reader035/viewer/2022062316/587388a71a28ab272d8b6383/html5/thumbnails/46.jpg)
Find Out More
• Documentation: http://www.datastax.com/docs
• Developer Blog: http://www.datastax.com/dev/blog
• Academy: https://academy.datastax.com
• Community Site: http://planetcassandra.org