Prof. Dr. Stefan Edlich €¦ · Oracle NoSQL Database ConsHash config ACID no single PF DataC...

Post on 04-Apr-2018

221 views 4 download

Transcript of Prof. Dr. Stefan Edlich €¦ · Oracle NoSQL Database ConsHash config ACID no single PF DataC...

Prof. Dr. Stefan Edlich http://nosql-database.org

2011

The NoSQL

Year!

2011

The NoSQL

Year!

1. HTML5

2. MongoDB

3. iOS

4. Android

5. Mobile app

6. Puppet

7. Hadoop

8. jQuery

9. PaaS

10. Social Media

CouchDB and

Membase

merger!

1 year ago!

CouchDB and

Membase

merger!

1 year ago!

+

CouchDBCouchDB MembaseMembase

= ??

Roadmap?

No Apache ���� git trouble and more

Less Erlang ���� Code more C / C++

No CouchDB ���� CouchBase Server

����

Damien is leaving CouchDB

Community has to take care of it

no upward compatibility

“And I'm dead serious about

making it the easiest, fastest

and most reliable NoSQL

database. Easy for developers

to use, easy to deploy,

reliable on single machines or

large clusters, and fast as

hell.”

UnQL

a successful standard?

unstructured

Damien Katz & Richard Hipp (SQLite)

Richard Hipp (SQLite):

“Damien Katz intends to provide

an UnQL interface to CouchDB in

the near future, yes.”

jaql !

query

dest

Language for JavaScript Object Notation

source

Oracle ?

MS-SQL ?

IBM DB2 ?

Sybase ?

Ted Neward:“Well, the buzz certainly grew, and it surprised

me that the big storage guys (Microsoft, IBM, Oracle)

didn't do more to address it; I was expecting features

to emerge in their database products to address some

of the features present in MongoDB or CouchDB or some

of the others, such as "schemaless" or map/reduce-style

queries. Even just incorporating JavaScript into the engine

somewhere would've generated a reaction.”

“The NoSQL databases are beginning

to feel like an ice cream store that

entices you with a new flavor of the

month,” the white paper read. “[But]

you shouldn’t get too attached to any

of the flavors because it may not be

around for too long.”

white paper:

„debunking the (NoSQL) hype“

summer 2011

Oracle NoSQL Database

ConsHash

config ACID

no single PF

DataC Replication

Top Admin

“BerkleyDB reloaded”

Hadoop + Manager

+=

user defined functions in C++ & Java

����10x faster then SQL or Stored Procs

UDF connector for Hadoop ���� ☺☺☺☺

C++ APIs for Map Reduce ���� ☺☺☺☺

Greenplum, Pervasive

and 100 others too…

Storage configurable

• round robin automatic loadbalancing

• replicas

• gateway

- performance

SSD? RAM+DataCenter

+ scale + configure

Attacking:

Mongo & Riak & Cassandra

bad things too?

NoSQL = No Security?

less sensitive info?

Key Bruteforce

Array injection/login.php?username=admin&password[$ne]=1

View injection

REST injection

JSON injectiondb.foo.find({$or : {a:1},{b:2},{c:/.*/})

http attacks for listeners

wrong cache proxy configs

thrift avro security

2007 SIGOPS

• 15 years of experience from Dynamo, SimpleDB and S3

• ultra scalable and reliable

• uses SSD (!)

• fully managed & no maintanance window!

• mutiple syncronous availability zone replication = durability

• provisioned throughput configurable per table

• no fixed schema, any number of attributes & multi value attributes

• consistency and performance tradeoffs possible

• conditional writes & atomic counters

• index: simple hash or composite hash + key/range

• define a table => make a rw capacity reservation

• backup & restore (tables) into S3

• Cloud Watch & Alarms

• 40 million of requests per month free

2ms read 6-8ms

1 $ / Gbmonth

0.01 $ per 10 writes / hours

0.01 $ per 50 read / sec up to 1KB

Eventually Consistent = doubles the

read amount

{ Id = 101ProductName = „NoSQL Book„ISBN = "978-3446427532„Authors = [ "Author 1", "Author 2" ]Price = -42Dimensions = "8.5 x 11.0 x 0.5„PageCount = 500InPublication = 1ProductCategory = "Book"

}

{ Id = 101ProductName = „NoSQL Book„ISBN = "978-3446427532„Authors = [ "Author 1", "Author 2" ]Price = -42Dimensions = "8.5 x 11.0 x 0.5„PageCount = 500InPublication = 1ProductCategory = "Book"

}

db x tables x items x attribuesdb x tables x items x attribues

uses JSON as serialized transport format!

REST APITable � create,describe,list,updateData � put(create/update),get,update,delete,query,scan,batch

// This header is abbreviated.// For a sample of a complete header, see link.POST / HTTP/1.1x-amz-target: DynamoDB_20111205.PutItemcontent-type: application/x-amz-json-1.0

{"TableName":"Table1 ","Item ":{ "AttributeName1 ":{"AttributeValue1 ":"S"},

"AttributeName2 ":{"AttributeValue2 ":"N"},},"Expected":{"AttributeName3 ":{"Value ": {"S":"AttributeValue "},{"Exists":Boolean}},"ReturnValues":"ReturnValuesConstant"}

HTTP/1.1 200x-amzn-RequestId: 8966d095-71e9-11e0-a498-71d736f27375content-type: application/x-amz-json-1.0content-length: 85

{"Attributes":{"AttributeName3":{"S":"AttributeValue3"},"AttributeName2":{"SS":"AttributeValue2"},"AttributeName1":{"SS":"AttributeValue1"},},

"ConsumedCapacityUnits":1 }

AWS SDK for Java, .NET, PHP

// Java getprivate static void getBook(String id, String tableName) {

GetItemRequest getItemRequest = new GetItemRequest().withTableName(tableName).withKey(new Key().withHashKeyElement(new Attribute Value().withN(id)).withAttributesToGet(Arrays.asList("Id", "ISBN", "T itle", "Authors"));

GetItemResult result = client.getItem(getItemRequest) ;

System.out.println("Printing item after retrieving it....");printItem(result.getItem());

}

64 KB Data Limit

string + int

multi value string-ints

multiKV, references,

schemachecks, …

API ���� DSLs ☺☺☺☺

64 KB Data Limit

string + int

multi value string-ints

multiKV, references,

schemachecks, …

API ���� DSLs ☺☺☺☺

Here are the six urban myths that Mr. Stonebraker

says NoSQL advocates incorrectly perpetuate:

• Myth #1: SQL is too slow,

so use a lower level interface

• Myth #2: I like a K-V interface, so SQL

is a non-starter

• Myth #3: SQL systems don’t scale

• Myth #4: There are no open source,

scalable SQL engines

• Myth #5: ACID is too slow, so avoid using it

• Myth #6: in CAP, choose AP over CA

strikes back

© 451 Group Report / 5.4.2011

Overview

Java Stored Procedures!

RAM with 100.000 ops/sNode

“VoltDB claims to be 100 times

faster than MySQL, up to 13 times

faster than Cassandra, and 45 times

faster than Oracle, with near-linear

scaling.” (highscalability blog)

ACID with partitioned tables

Nearly SQL 99 and ALTER &DROP

schema changes require Shutdown

static query parametrization

Quelle: Pecond MySQL Performance Blog

SSD optimized and disk

C’t: 10-100 TB ok then weaker

10x faster

scaling across cores

random access read pattern

QPS on SSD

84.42614.763

5,5 x

faster

− memcached API more soon

− no structured data

− horizontal scaling for nodes

� "terabytes of data, billions of objects, and 200K plus

transactions per second per node, with sub-millisecond latency."

� e.g. real-time bidding

� transactions / ACID

� linear & elastic horizontal scalable

� flash/SSD support

RTARTARTARTATMTMTMTM

� data expiration

� append list

� API: C, C#, Java, Ruby, Python & PHP

� no master node

� 200k Ops/secNode read 50k Ops/secNode write

Check hybrid solutions!

easier & better then memcache + RDBMS

Problem: privilege checks, cach queries, connection pooling / thread creation,

parsing SQL, open, lock, exec plans, concurrency control, unlock, close, …

© fromdual.com

QUELLE: YOSHINORI MATSUNOBU

keep tables open & simple protocol

Performance

Transactions

Concurrent Access

No Cache / Crash-Safe

no SQL but more then

K/V: ranges, LIMIT, CRUD, multi_get,…

no Security

new API

© percona.com

200

100

Conclusion #1

Conclusion #2

There is no

“one perfect solution”

Check hybrid solutions

and NewSQL DBs too!

© geekandpoke.com