Cassandra - Beyond Read-Modify-Write by Al Tobey

33
©2014 DataStax @AlTobey Open Source Mechanic | Datastax Apache Cassandra のオープンソースエバンジェリスト Beyond Read-Modify-Write 1 Obsessed with infrastructure my whole life. See distributed systems everywhere.

description

YouTube Video: TBA As we move into the world of Big Data and the Internet of Things, the systems architectures and data models we've relied on for decades are becoming a hindrance. At the core of the problem is the read-modify-write cycle. In this session, Al will talk about how to build systems that don't rely on RMW, with a focus on Cassandra. Finally, for those times when RMW is unavoidable, he will cover how and when to use Cassandra's lightweight transactions and collections.

Transcript of Cassandra - Beyond Read-Modify-Write by Al Tobey

Page 1: Cassandra - Beyond Read-Modify-Write by Al Tobey

©2014 DataStax

@AlTobey Open Source Mechanic | Datastax Apache Cassandra のオープンソースエバンジェリスト

Beyond Read-Modify-Write

!1

Obsessed with infrastructure my whole life.See distributed systems everywhere.

Page 2: Cassandra - Beyond Read-Modify-Write by Al Tobey

The Problem

Skype video call with grandma at 3 hrs.Using Netflix solo at 18 months.Plays many games with online metrics at 4 years old.

Page 3: Cassandra - Beyond Read-Modify-Write by Al Tobey

The Problem!Users expect their infrastructure to Just Work.

Explain to my 4 year old why he can’t watch cartoons on Netflix.

Page 4: Cassandra - Beyond Read-Modify-Write by Al Tobey

The Problem

Lag kills.

Page 5: Cassandra - Beyond Read-Modify-Write by Al Tobey

The Problem

Even when everything is working, or even especially then, usage can explode, often unexpectedly.

Page 6: Cassandra - Beyond Read-Modify-Write by Al Tobey

Evolution

Client/ServerClassic 3-tier

3-tier + read scaled DB + cache

A high-level overview of internet architectures.

Page 7: Cassandra - Beyond Read-Modify-Write by Al Tobey

Client-server

Client - server database architecture.Obsolete.The original “funnel-shaped” architecture

Page 8: Cassandra - Beyond Read-Modify-Write by Al Tobey

3-tier

Client - client - server database architecture.Still suitable for small applications.e.g. LAMP, RoR

Page 9: Cassandra - Beyond Read-Modify-Write by Al Tobey

3-tier master/slave

master slaveslave

Client - client - server database architecture.Still suitable for small applications.

Page 10: Cassandra - Beyond Read-Modify-Write by Al Tobey

3-tier + caching

master slaveslave

cache

more complexcache coherency is a hard problemcascading failures are commonNext: Out: the funnel. In: The ring.

Page 11: Cassandra - Beyond Read-Modify-Write by Al Tobey

Webscale

outer ring: clients (cell phones, etc.)middle ring: application serversinside ring: Cassandra servers!Serving millions of clients with mere hundreds or thousands of nodes requires a different approach to applications!

Page 12: Cassandra - Beyond Read-Modify-Write by Al Tobey

When it Rains

scale out … but at a cost

Page 13: Cassandra - Beyond Read-Modify-Write by Al Tobey

Beyond Read-Modify-Write•Practical Safety •Eventual Consistency •Overwrites •Key / Value •Journal / Logging / Time-series •Content-addressable-storage •Cassandra Collection Types •Cassandra Lightweight Transactions

Page 14: Cassandra - Beyond Read-Modify-Write by Al Tobey

Theory & Practice

In theory there is no difference between theory and practice. In practice there is. !

-Yogi Berra

I could talk about CAP theorem, but there’s plenty of that going around. And then there’s this quote.

Page 15: Cassandra - Beyond Read-Modify-Write by Al Tobey

Safety

Fun happens when you turn the safety off.!But you can’t sell it.

Page 16: Cassandra - Beyond Read-Modify-Write by Al Tobey

Safety

But you don’t have to throw it out entirely. - roll cages - kill switches - rev limiters - protective clothing

Page 17: Cassandra - Beyond Read-Modify-Write by Al Tobey

Safety

max speed is 320 km/h (200 mph)Tested to 443 km/h (275 mph) & 581 km/h (361 mph) (world record)Safety is ultimately a property of the system.But it can be expensive. - lots of maintenance - inspections - reputation

Page 18: Cassandra - Beyond Read-Modify-Write by Al Tobey

Read-Modify-WriteUPDATE  Employees  SET  Rank=4,  Promoted=2014-­‐01-­‐24  WHERE  EmployeeID=1337;

EmployeeID**1337Name********アルトビーStartDate***2013510501Rank********3Promoted****null

EmployeeID**1337Name********アルトビーStartDate***2013510501Rank********4Promoted****2014501524

This might be what it looks like from SQL / CQL, but …!

Page 19: Cassandra - Beyond Read-Modify-Write by Al Tobey

Read-Modify-WriteUPDATE  Employees  SET  Rank=4,  Promoted=2014-­‐01-­‐24  WHERE  EmployeeID=1337;

TNSTAAFL 無償の昼食なんてものはありません

EmployeeID**1337Name********アルトビーStartDate***2013510501Rank********4Promoted****2014501524

EmployeeID**1337Name********アルトビーStartDate***2013510501Rank********3Promoted****null

RDBMS

TNSTAAFL …If you’re lucky, the cell is in cache.Otherwise, it’s a disk access to read, another to write.

Page 20: Cassandra - Beyond Read-Modify-Write by Al Tobey

Eventual ConsistencyUPDATE  Employees  SET  Rank=4,  Promoted=2014-­‐01-­‐24  WHERE  EmployeeID=1337;

EmployeeID**1337Name********アルトビーStartDate***2013510501Rank********4Promoted****2014501524

EmployeeID**1337Name********アルトビーStartDate***2013510501Rank********3Promoted****null

Coordinator

Explain distributed RMWMore complicated.Will talk about how it’s abstracted in CQL later.

Page 21: Cassandra - Beyond Read-Modify-Write by Al Tobey

Eventual ConsistencyUPDATE  Employees  SET  Rank=4,  Promoted=2014-­‐01-­‐24  WHERE  EmployeeID=1337;

EmployeeID**1337Name********アルトビーStartDate***2013510501Rank********4Promoted****2014501524

EmployeeID**1337Name********アルトビーStartDate***2013510501Rank********3Promoted****null

Coordinator

read

write

Memory replication on write, depending on RF, usually RF=3.Reads AND writes remain available through partitions.Hinted handoff.

Page 22: Cassandra - Beyond Read-Modify-Write by Al Tobey

OverwritingCREATE TABLE host_lookup ( name varchar, id uuid, PRIMARY KEY(name) ); !INSERT INTO host_uuid (name,id) VALUES (“www.tobert.org”, “463b03ec-fcc1-4428-bac8-80ccee1c2f77”); !INSERT INTO host_uuid (name,id) VALUES (“tobert.org”, “463b03ec-fcc1-4428-bac8-80ccee1c2f77”); !INSERT INTO host_uuid (name,id) VALUES (“www.tobert.org”, “463b03ec-fcc1-4428-bac8-80ccee1c2f77”); !SELECT id FROM host_lookup WHERE name=“tobert.org”;

Beware of expensive compactionBest for: small indexes, lookup tablesCompaction handles RMW at storage level in the background.Under heavy writes, clock synchronization is very important to avoid timestamp collisions. In practice, this isn’t a problem very often and even when it goes wrong, not much harm done.

Page 23: Cassandra - Beyond Read-Modify-Write by Al Tobey

Key/ValueCREATE TABLE keyval ( key VARCHAR, value blob, PRIMARY KEY(key) ); !INSERT INTO keyval (key,value) VALUES (?, ?); !SELECT value FROM keyval WHERE key=?;

e.g. memcachedDon’t do this.But it works when you really need it.

Page 24: Cassandra - Beyond Read-Modify-Write by Al Tobey

Journaling / Logging / Time-seriesCREATE TABLE tsdb ( time_bucket timestamp, time timestamp, value blob, PRIMARY KEY(time_bucket, time) ); !INSERT INTO tsdb (time_bucket, time, value) VALUES ( “2014-10-24”, -- 1-day bucket (UTC) “2014-10-24T12:12:12Z”, -- ALWAYS USE UTC ‘{“foo”: “bar”}’ );

Oversimplified, use normalization over blobs whenever possible.ALWAYS USE UTC :)

Page 25: Cassandra - Beyond Read-Modify-Write by Al Tobey

Journaling / Logging / Time-series

{"“2014(01(24”"=>"{""""“2014(01(24T12:12:12Z”"=>"{""""""""‘{“foo”:"“bar”}’""""}}

2014(01(24 2014(01(24T12:12:12Z{“key”:"“value”}

2014(01(25 2014(01(25T13:13:13Z{“key”:"“value”}

2014(01(24T21:21:21Z{“key”:"“value”}

Oversimplified, use normalization over blobs whenever possible.ALWAYS USE UTC :)

Page 26: Cassandra - Beyond Read-Modify-Write by Al Tobey

Content Addressable StorageCREATE TABLE objects ( cid varchar, content blob, PRIMARY KEY(cid) ); !INSERT INTO objects (cid,content) VALUES (?, ?); !SELECT content FROM objects WHERE cid=?;

The address of the data can be created by using the data itself.e.g. SHA1 (160 bits), MD5, Whirpool, etc.

Page 27: Cassandra - Beyond Read-Modify-Write by Al Tobey

Content Addressable Storagerequire  'cql'  require  ‘digest/sha1'  !dbh  =  Cql::Client.connect(hosts:  ['127.0.0.1'])  dbh.use('cas')  !data  =  {  :timestamp  =>  1390436043,  :value  =>  1234  }  !cid  =  Digest::SHA1.new.digest(data.to_s).unpack(‘H*’)  !sth  =  dbh.prepare(     'SELECT  content  FROM  objects  WHERE  cid=?')  !sth.execute(root_cid).first[‘content’]

Oversimplified! e.g. data.to_s is a BAD ideaALWAYS USE UTC :)

Page 28: Cassandra - Beyond Read-Modify-Write by Al Tobey

In Practice• In practice, RMW is sometimes unavoidable

• Recent versions of Cassandra support RMW

• Use them only when necessary

• Or when performance hit is mitigated elsewhere or irrelevant

Page 29: Cassandra - Beyond Read-Modify-Write by Al Tobey

Cassandra CollectionsCREATE TABLE posts ( id uuid, body varchar, created timestamp, authors set<varchar>, tags set<varchar>, PRIMARY KEY(id) ); !INSERT INTO posts (id,body,created,authors,tags) VALUES ( ea4aba7d-9344-4d08-8ca5-873aa1214068, ‘アルトビーの犬はばかね’, ‘now', [‘アルトビー’, ’ィオートビー’], [‘dog’, ‘silly’, ’犬’, ‘ばか’] );

quick story about 犬ばかねsets & maps are CRDTs, safe to modify

Page 30: Cassandra - Beyond Read-Modify-Write by Al Tobey

Cassandra CollectionsCREATE TABLE metrics ( bucket timestamp, time timestamp, value blob, labels map<varchar,varchar>, PRIMARY KEY(bucket) );

sets & maps are CRDTs, safe to modify

Page 31: Cassandra - Beyond Read-Modify-Write by Al Tobey

Lightweight Transactions• Cassandra 2.0 and on support LWT based on PAXOS

• PAXOS is a distributed consensus protocol

• Given a constraint, Cassandra ensures correct ordering

Page 32: Cassandra - Beyond Read-Modify-Write by Al Tobey

Lightweight TransactionsUPDATE  users          SET  username=‘tobert’    WHERE  id=68021e8a-­‐9eb0-­‐436c-­‐8cdd-­‐aac629788383          IF  username=‘renice’;  !INSERT  INTO  users  (id,  username)  VALUES  (68021e8a-­‐9eb0-­‐436c-­‐8cdd-­‐aac629788383,  ‘renice’)  IF  NOT  EXISTS;  !!

Client error on conflict.

Page 33: Cassandra - Beyond Read-Modify-Write by Al Tobey

Conclusion• Businesses are scaling further and faster than ever

• Assume you have to provide utility-grade service

• Data models and application architectures need to change to keep up

• Avoiding Read/Modify/Write makes high-performance easier

• Cassandra provides tools for safe RMW when you need it

!

• Questions?