Breadthor Depth
What's in a column-store?
Jeff SmithFebruary 23, 2013
This presentationIs not
marketingtechnicalarbitrarypolitetraining
Ispersuasivefor the technicalpreciseopinionatededucational
Srsouly
Bio{ past :[startups, biotech, data_management],school : [research, HKU, uncertain_data],work : [AI, finance, prediction] }
This guy
Daniel Abadi
Back to the future● 1 database to rule them all● A scrappy band of rebels● A brave new idea
The big questionWhy grab this?
When all you want is this?
id thing attr1 attr2 attr3 attr4 attr5 attr6 attr7 attr8
123 doodad abc def ghi jkl mno pqr stu vwx
id thing
123 doodad
You're chopping it wrong.
Relations in pieces
id pet weight poops_per_day
1 dog 40 3
2 cat 15 2
3 bird 5 4
4 snake 78 0.25
Horizontal Partitions
id pet weight poops_per_day
1 dog 40 3
2 cat 15 2
3 bird 5 4
4 snake 78 0.25
You gotta get yourself some marble columns.
Vertical Partitions
id
1
2
3
4
pet
dog
cat
bird
snake
weight
40
15
5
78
poops_per_day
3
2
4
0.25
We're gonna need a bigger table.
NoSQL startsEmpire crumblesNomenclature obfuscates
BigTable
I know that song!
Column...families?!
row_id best_pet worst_pet illegal_pet
123 bulldog turtle rhino
row_id make model
123 Smart Fortwo
Pets Cars
Modest MapYear of the snake =>4G =>NoSQL =>Beard =>Column-stores =>
Year of PythonLTENon-relationalFace-mane{column-store | column-family-store}
Does it smell as sweet?
...at column-oriented tasks.
C-Store rocks*
* Contrary to popular belief, after years of effort, Cleveland still does not rock.
Move, b*tch.Get out the vote.
age
23
32
45
67
56
49
43
50
63
34
The catch
Attack of the clones
The contendersHBase*Cassandra*HypertableAccumulo
* The ones that matter
HBaseHadoop stackJava everywhereComponents, extensions, variables, headaches...
Tastes like SQLSELECT sensorid, (20-down)/(up-down) AS probabilityFROM hive_sensors WHERE down>=10 AND up>=20 and down <=20UNION ALLSELECT sensorid, (up-10)/(up-down) AS probabilityFROM hive_sensors WHERE up>=10 AND up<=20 and down <=10UNION ALLSELECT sensorid, 1 AS probabilityFROM hive_sensors WHERE up<=20 and down >=10UNION ALLSELECT sensorid, (20-10)/(up-down) AS probability
FROM hive_sensors WHERE down<=10 AND up>=20;
CassandraCQL interfacePeer to peerBetter, but...
Anything you can do, I can do better.
Sparsenessid attr1 attr2 attr3 attr4
1 1
2 1
3 1
4 1
5
6 1
7
8 1
9 1
10
11
Dynamic Schemas
row_id best_pet worst_pet illegal_pet robot_pet
123 bulldog turtle rhino aibo
456 shi tzu gecko koala
row_id make model
123 Smart Fortwo
456 VW Golf
Pets Cars
Stronger in the broken places
InnovationTruly distributed systemsColumns as metadataArbitrarily deep column hierarchies*Community database development
* Someday soon, I hope
Pig & friendsdata = load 'hbase://table_name' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf1:*', '-loadKey true') AS (id:chararray, stats: map[int]);
@outputSchema("values:bag{t:tuple(key, value)}")def bag_of_tuples(map_dict): return map_dict.items()
register 'udfs.py' using jython as pydata = load 'hbase://table_name' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf1:*', '-loadKey true') AS (id:chararray, stats: map[int]);databag = foreach data generate id, FLATTEN(py.bag_of_tuples(stats));
from Chase Seibert
No dog in this fight
Hey I just met youAnd this is crazyBut here's my emailMail me maybe
Work Play
Disclaimer
All images used in this presentation were stolen from the internet in a daring midnight raid that left 3 dead and 8 wounded. No license was obtained for their use and no license is implied by their misappropriation.
Yarrr. BarrrCamp.
Please don't sue me. I have nothing. Just a dog. Don't take my dog.
Top Related