NoSQL, No sweat with JBoss Data Grid
-
Upload
shane-johnson -
Category
Technology
-
view
1.815 -
download
2
description
Transcript of NoSQL, No sweat with JBoss Data Grid
Shane K Johnson / Tristan Tarrant1
NoSQL: No sweat with JBoss Data Grid
Shane JohnsonTechnical Marketing Manager
Tristan TarrantPrincipal Software Engineer
10/08/2012
Shane K Johnson / Tristan Tarrant2
NoSQL NOSQL
Shane K Johnson / Tristan Tarrant3
Agenda
● Data Stores
● Data Grid● NOSQL● Cache
● Big Data
● Use Cases
● Q & A
Shane K Johnson / Tristan Tarrant4
Data Stores
● Key / Value
● Document
● Graph
● Column Family
● And more...
Shane K Johnson / Tristan Tarrant5
Data Grid?
Shane K Johnson / Tristan Tarrant6
Shane K Johnson / Tristan Tarrant7
Shane K Johnson / Tristan Tarrant8
Shane K Johnson / Tristan Tarrant9
NOSQL
● Elasticity
● Distributed Data
● Concurrency
● CAP Theorem
● Flexibility
Shane K Johnson / Tristan Tarrant10
Elasticity
● Node Discovery
● Failure Detection
Shane K Johnson / Tristan Tarrant11
How?
Shane K Johnson / Tristan Tarrant12
JBoss Data Grid is built on a reliable group membership protocol: JGroups.
Shane K Johnson / Tristan Tarrant13
Distributed Data
Shane K Johnson / Tristan Tarrant14
Replicated
Shane K Johnson / Tristan Tarrant15
Distributed
Shane K Johnson / Tristan Tarrant16
How?
Shane K Johnson / Tristan Tarrant17
Consistent Hashing
JBoss Data Grid Implementation: MurmurHash3
Shane K Johnson / Tristan Tarrant18
Hash Wheel
Shane K Johnson / Tristan Tarrant19
Virtual Nodes
Shane K Johnson / Tristan Tarrant20
Linear Scaling
Shane K Johnson / Tristan Tarrant21
Concurrency
Shane K Johnson / Tristan Tarrant22
How?
Shane K Johnson / Tristan Tarrant23
Multi Version Concurrency Control
Shane K Johnson / Tristan Tarrant24
Internals
● Transactions● 2 PC● Isolation Level
● Read Committed● Repeatable Read
● Locking● Optimistic● Pessimistic
● Write Skew● Version – Vector Clocks
Shane K Johnson / Tristan Tarrant25
Consistency
Shane K Johnson / Tristan Tarrant26
CAP TheoremEric Brewer
Shane K Johnson / Tristan Tarrant27
CAP Theorem
● Consistency
● Availability
● Partition Tolerance
Shane K Johnson / Tristan Tarrant28
JBoss Data Grid + CAP Theorem
● No Physical Partition● Consistent and Available (C + A)
● Physical Partition● Available (A + P)
● Pseudo Partition (e.g. Unresponsive Node)● Consistent or Available (C + P / A + P)
Shane K Johnson / Tristan Tarrant29
Flexibility
Shane K Johnson / Tristan Tarrant30
Flexibility
● Replicated Data● Replication Queue● State Transfer – Enable / Disabled
● Distributed Data● Number of Owners● Rehash – Enable / Disable
● Communication – Synchronous / Asynchronous
● Isolation – Read Committed / Repeatable Read
● Locking – Optimistic / Pessimistic
Shane K Johnson / Tristan Tarrant31
Shane K Johnson / Tristan Tarrant32
Caching and Data Grids for JEE
Caching Data Grids
JSR-107 JSR-347
Shane K Johnson / Tristan Tarrant33
Caching in Java
● Developers have been doing it forever● To increase performance● To offload legacy data-stores from unnecessary
requests● Home-brew approach based on Hashtables and Maps
● Many Free and commercial libraries but...
● … no Standard !
Shane K Johnson / Tristan Tarrant34
JSR-107: Caching for JEE
● Local (single JVM) and Distributed (multiple JVMs) caches
● CacheManager: a way to obtain caches
● Cache, “inspired” by the Map API with extensions for entry expiration and additional atomic operations
● A Cache Lifecycle (starting, stopping)
● Entry Listeners for specific events
● Optional features: JTA support and annotations
● One of the oldest JSRs, dormant for a long time, recently revived by JSR-347
Shane K Johnson / Tristan Tarrant35
And now ?
● Now that I've put a lot of data in my distributed cache, what can I do with it ?
● And most importantly...
● HOW ?
Shane K Johnson / Tristan Tarrant36
Multiple clustering options
● Replication
● All nodes have all of the data.
● Grid Size == smallest node
● Distribution
● The Grid maintains n copies of each time of data on different nodes
● Grid Size == total size / n
Shane K Johnson / Tristan Tarrant37
We like asynchronous
● So much that we want it in the API:
● Future<V> getAsync(K);
● Future<V> getAndPut(K, V);
Shane K Johnson / Tristan Tarrant38
Keeping things close together
● If I need to access semantically-close data quickly, why not keep it on the same node ?
● Grouping API
● Distribution per-group and not per-key
● Via annotations
● Via a Grouper class
Shane K Johnson / Tristan Tarrant39
Eventual consistency
● One step further than asynchronous clustering for higher performance
● Entries are tagged with a version (e.g. a timestamp or a time-based UUID): newer versions will eventually replace all older versions in the cluster
● Applications retrieving data may get an older entry, which may be “good enough”
Shane K Johnson / Tristan Tarrant40
Big Data
Shane K Johnson / Tristan Tarrant41
Remote Query
Shane K Johnson / Tristan Tarrant42
Distributed Query
Shane K Johnson / Tristan Tarrant43
Performing parallel computation
● Distributed Executors
● Run on all nodes where a cache exists
● Each executor works on the slice of data local to itself
● Fastest access
● Parallelization of operations
● Usually returns
Shane K Johnson / Tristan Tarrant44
Map / Reduce
● A mapper function iterates through a set of key/values transforming them and sending them to a collector
void map(KIn, VIn, Collector<KOut, Vout>)
● A reducer works through the collected values for each key, returning a single value
VOut reduce(KOut, Iterator<VOut>)
● Finally a collator processes the reduced key/values and returns a result to the invoker
R collate(Map<KOut, VOut> reducedResults)
Shane K Johnson / Tristan Tarrant45
Use Cases
Shane K Johnson / Tristan Tarrant46
Replicated Use Case
● Finance● Master / Slave● High Availability● Failover● Performance + Consistency● Data – Lifespan● Servers – Few● Memory – Medium
Shane K Johnson / Tristan Tarrant47
Distributed Use Case #1
● Telecom / Media● Performance > Consistency● Data
● Infinite● Calculated
● Servers – Few● Memory – Large
Shane K Johnson / Tristan Tarrant48
Distributed Use Case #2
● Telecom● Consistency > Performance● Data
● Continuous● Limited Lifespan
● Servers – Many● Memory - Normal
Shane K Johnson / Tristan Tarrant49
Q & A
Look for a follow up on the howtojboss.com blog.
Shane K Johnson / Tristan Tarrant50
Thanks for joining us.