Cassandra - Beyond Read-Modify-Write by Al Tobey
-
Upload
planet-cassandra -
Category
Technology
-
view
3.456 -
download
0
description
Transcript of Cassandra - Beyond Read-Modify-Write by Al Tobey
©2014 DataStax
@AlTobey Open Source Mechanic | Datastax Apache Cassandra のオープンソースエバンジェリスト
Beyond Read-Modify-Write
!1
Obsessed with infrastructure my whole life.See distributed systems everywhere.
The Problem
Skype video call with grandma at 3 hrs.Using Netflix solo at 18 months.Plays many games with online metrics at 4 years old.
The Problem!Users expect their infrastructure to Just Work.
Explain to my 4 year old why he can’t watch cartoons on Netflix.
The Problem
Lag kills.
The Problem
Even when everything is working, or even especially then, usage can explode, often unexpectedly.
Evolution
Client/ServerClassic 3-tier
3-tier + read scaled DB + cache
A high-level overview of internet architectures.
Client-server
Client - server database architecture.Obsolete.The original “funnel-shaped” architecture
3-tier
Client - client - server database architecture.Still suitable for small applications.e.g. LAMP, RoR
3-tier master/slave
master slaveslave
Client - client - server database architecture.Still suitable for small applications.
3-tier + caching
master slaveslave
cache
more complexcache coherency is a hard problemcascading failures are commonNext: Out: the funnel. In: The ring.
Webscale
outer ring: clients (cell phones, etc.)middle ring: application serversinside ring: Cassandra servers!Serving millions of clients with mere hundreds or thousands of nodes requires a different approach to applications!
When it Rains
scale out … but at a cost
Beyond Read-Modify-Write•Practical Safety •Eventual Consistency •Overwrites •Key / Value •Journal / Logging / Time-series •Content-addressable-storage •Cassandra Collection Types •Cassandra Lightweight Transactions
Theory & Practice
In theory there is no difference between theory and practice. In practice there is. !
-Yogi Berra
I could talk about CAP theorem, but there’s plenty of that going around. And then there’s this quote.
Safety
Fun happens when you turn the safety off.!But you can’t sell it.
Safety
But you don’t have to throw it out entirely. - roll cages - kill switches - rev limiters - protective clothing
Safety
max speed is 320 km/h (200 mph)Tested to 443 km/h (275 mph) & 581 km/h (361 mph) (world record)Safety is ultimately a property of the system.But it can be expensive. - lots of maintenance - inspections - reputation
Read-Modify-WriteUPDATE Employees SET Rank=4, Promoted=2014-‐01-‐24 WHERE EmployeeID=1337;
EmployeeID**1337Name********アルトビーStartDate***2013510501Rank********3Promoted****null
EmployeeID**1337Name********アルトビーStartDate***2013510501Rank********4Promoted****2014501524
This might be what it looks like from SQL / CQL, but …!
Read-Modify-WriteUPDATE Employees SET Rank=4, Promoted=2014-‐01-‐24 WHERE EmployeeID=1337;
TNSTAAFL 無償の昼食なんてものはありません
EmployeeID**1337Name********アルトビーStartDate***2013510501Rank********4Promoted****2014501524
EmployeeID**1337Name********アルトビーStartDate***2013510501Rank********3Promoted****null
RDBMS
TNSTAAFL …If you’re lucky, the cell is in cache.Otherwise, it’s a disk access to read, another to write.
Eventual ConsistencyUPDATE Employees SET Rank=4, Promoted=2014-‐01-‐24 WHERE EmployeeID=1337;
EmployeeID**1337Name********アルトビーStartDate***2013510501Rank********4Promoted****2014501524
EmployeeID**1337Name********アルトビーStartDate***2013510501Rank********3Promoted****null
Coordinator
Explain distributed RMWMore complicated.Will talk about how it’s abstracted in CQL later.
Eventual ConsistencyUPDATE Employees SET Rank=4, Promoted=2014-‐01-‐24 WHERE EmployeeID=1337;
EmployeeID**1337Name********アルトビーStartDate***2013510501Rank********4Promoted****2014501524
EmployeeID**1337Name********アルトビーStartDate***2013510501Rank********3Promoted****null
Coordinator
read
write
Memory replication on write, depending on RF, usually RF=3.Reads AND writes remain available through partitions.Hinted handoff.
OverwritingCREATE TABLE host_lookup ( name varchar, id uuid, PRIMARY KEY(name) ); !INSERT INTO host_uuid (name,id) VALUES (“www.tobert.org”, “463b03ec-fcc1-4428-bac8-80ccee1c2f77”); !INSERT INTO host_uuid (name,id) VALUES (“tobert.org”, “463b03ec-fcc1-4428-bac8-80ccee1c2f77”); !INSERT INTO host_uuid (name,id) VALUES (“www.tobert.org”, “463b03ec-fcc1-4428-bac8-80ccee1c2f77”); !SELECT id FROM host_lookup WHERE name=“tobert.org”;
Beware of expensive compactionBest for: small indexes, lookup tablesCompaction handles RMW at storage level in the background.Under heavy writes, clock synchronization is very important to avoid timestamp collisions. In practice, this isn’t a problem very often and even when it goes wrong, not much harm done.
Key/ValueCREATE TABLE keyval ( key VARCHAR, value blob, PRIMARY KEY(key) ); !INSERT INTO keyval (key,value) VALUES (?, ?); !SELECT value FROM keyval WHERE key=?;
e.g. memcachedDon’t do this.But it works when you really need it.
Journaling / Logging / Time-seriesCREATE TABLE tsdb ( time_bucket timestamp, time timestamp, value blob, PRIMARY KEY(time_bucket, time) ); !INSERT INTO tsdb (time_bucket, time, value) VALUES ( “2014-10-24”, -- 1-day bucket (UTC) “2014-10-24T12:12:12Z”, -- ALWAYS USE UTC ‘{“foo”: “bar”}’ );
Oversimplified, use normalization over blobs whenever possible.ALWAYS USE UTC :)
Journaling / Logging / Time-series
{"“2014(01(24”"=>"{""""“2014(01(24T12:12:12Z”"=>"{""""""""‘{“foo”:"“bar”}’""""}}
2014(01(24 2014(01(24T12:12:12Z{“key”:"“value”}
2014(01(25 2014(01(25T13:13:13Z{“key”:"“value”}
2014(01(24T21:21:21Z{“key”:"“value”}
Oversimplified, use normalization over blobs whenever possible.ALWAYS USE UTC :)
Content Addressable StorageCREATE TABLE objects ( cid varchar, content blob, PRIMARY KEY(cid) ); !INSERT INTO objects (cid,content) VALUES (?, ?); !SELECT content FROM objects WHERE cid=?;
The address of the data can be created by using the data itself.e.g. SHA1 (160 bits), MD5, Whirpool, etc.
Content Addressable Storagerequire 'cql' require ‘digest/sha1' !dbh = Cql::Client.connect(hosts: ['127.0.0.1']) dbh.use('cas') !data = { :timestamp => 1390436043, :value => 1234 } !cid = Digest::SHA1.new.digest(data.to_s).unpack(‘H*’) !sth = dbh.prepare( 'SELECT content FROM objects WHERE cid=?') !sth.execute(root_cid).first[‘content’]
Oversimplified! e.g. data.to_s is a BAD ideaALWAYS USE UTC :)
In Practice• In practice, RMW is sometimes unavoidable
• Recent versions of Cassandra support RMW
• Use them only when necessary
• Or when performance hit is mitigated elsewhere or irrelevant
Cassandra CollectionsCREATE TABLE posts ( id uuid, body varchar, created timestamp, authors set<varchar>, tags set<varchar>, PRIMARY KEY(id) ); !INSERT INTO posts (id,body,created,authors,tags) VALUES ( ea4aba7d-9344-4d08-8ca5-873aa1214068, ‘アルトビーの犬はばかね’, ‘now', [‘アルトビー’, ’ィオートビー’], [‘dog’, ‘silly’, ’犬’, ‘ばか’] );
quick story about 犬ばかねsets & maps are CRDTs, safe to modify
Cassandra CollectionsCREATE TABLE metrics ( bucket timestamp, time timestamp, value blob, labels map<varchar,varchar>, PRIMARY KEY(bucket) );
sets & maps are CRDTs, safe to modify
Lightweight Transactions• Cassandra 2.0 and on support LWT based on PAXOS
• PAXOS is a distributed consensus protocol
• Given a constraint, Cassandra ensures correct ordering
Lightweight TransactionsUPDATE users SET username=‘tobert’ WHERE id=68021e8a-‐9eb0-‐436c-‐8cdd-‐aac629788383 IF username=‘renice’; !INSERT INTO users (id, username) VALUES (68021e8a-‐9eb0-‐436c-‐8cdd-‐aac629788383, ‘renice’) IF NOT EXISTS; !!
Client error on conflict.
Conclusion• Businesses are scaling further and faster than ever
• Assume you have to provide utility-grade service
• Data models and application architectures need to change to keep up
• Avoiding Read/Modify/Write makes high-performance easier
• Cassandra provides tools for safe RMW when you need it
!
• Questions?