MongoDB - Chewbii.com · MongoDB ESILV [email protected] Applications withMongoDB Metlife:...

15
MongoDB [email protected] ESILV 1 MongoDB [email protected] ESILV Introduction Distributed NoSQL database Large data management (Humongous) Document-oriented NoSQL JSON documents (BSON Objects serialization) Key points: Simple to integrate for developers Rich query language MQL: Mongo Query Language Several optimization features Attributes indexing (Shard/Btree/RTree) Smart data allocation Sharding (GridFS) [+ tagging / clustering] Several deployement Private/Public Cloud, Local (On Premise) 2

Transcript of MongoDB - Chewbii.com · MongoDB ESILV [email protected] Applications withMongoDB Metlife:...

Page 1: MongoDB - Chewbii.com · MongoDB ESILV nicolas.travers@devinci.fr Applications withMongoDB Metlife: unifiedview Cisco: e-commerce Bosch: IoT HSBC: digital transformation The WeatherChannel

MongoDB

[email protected] 1

MongoDB

[email protected]

Introduction

• Distributed NoSQL database▫ Large data management

(Humongous)▫ Document-oriented NoSQL

� JSON documents� (BSON Objects serialization)

• Key points:▫ Simple to integrate for developers▫ Rich query language

� MQL: Mongo Query Language▫ Several optimization features

� Attributes indexing (Shard/Btree/RTree)

▫ Smart data allocation� Sharding (GridFS) [+ tagging /

clustering]▫ Several deployement

� Private/Public Cloud, Local (On Premise)

2

Page 2: MongoDB - Chewbii.com · MongoDB ESILV nicolas.travers@devinci.fr Applications withMongoDB Metlife: unifiedview Cisco: e-commerce Bosch: IoT HSBC: digital transformation The WeatherChannel

MongoDB

[email protected]

Applications with MongoDBMetlife : unified viewCisco : e-commerceBosch : IoTHSBC : digital transformationThe Weather Channel : mobilityExpedia : trip recommendationsebay : productsAstraZeneca : medics loggingComcast : Database-as-a-serviceKPMG : AnalyticsX.ai : AIEA : videos games

3

MongoDB

[email protected]

Evolutions• V3.0▫ Compression, OpsManager

• V3.2▫ Schema validation, $lookup, BI/Spark

connectors

• V3.4▫ Load balancing++, $facet, zones++

• V3.6▫ Causal Consistency, sessions

• V4.0▫ ACID Transactions (local), Types

conversion▫ MongoDB Stitch (lightweight local /

mobile)▫ Security SHA2 (cryptage TLS1.1)

• V4.2▫ Transactions ACID in sharding !!!

4

Page 3: MongoDB - Chewbii.com · MongoDB ESILV nicolas.travers@devinci.fr Applications withMongoDB Metlife: unifiedview Cisco: e-commerce Bosch: IoT HSBC: digital transformation The WeatherChannel

MongoDB

[email protected]

Interactions: JavaScript

• JS objects▫ Attributes + functions▫ db: database▫ sh: sharding▫ rs: replica set• MQL▫ "JSON" = pattern• MapReduce▫ Functions (untyped)

5

Studio3t

Robo3t

MongoDBCompass

MongoDB

[email protected]

Starting Commands• Select database (shell)

▫ Command: > use myBD ;

• Collections of documents▫ Create: > db.createCollection(‘users’);

▫ Manipulate: > db.users. <command> ;

� Commands : find(), save(), delete(), update(), aggregate(), distinct(), mapReduce()…

� Meaning in SQL: FROM

• Documents

▫ JSON:

▫ Insert: > db.users.save ( ); //no quotes!

{

"_id": ObjectId("4efa8d2b7d284dad101e4bc7"),

"name": "James Bond", "login": "james", "age": 50,

"address”: {"street": "30 Wellington Square", "city": "London”},

"job" : ["agent", "MI6"]

}

6

Page 4: MongoDB - Chewbii.com · MongoDB ESILV nicolas.travers@devinci.fr Applications withMongoDB Metlife: unifiedview Cisco: e-commerce Bosch: IoT HSBC: digital transformation The WeatherChannel

MongoDB

[email protected]

MQL: find queries1 (1/2)• Document oriented queries• Command: > db.users.find( <filter> , <projection> );

• Filter:• “JSON” pattern that must be matched on the document• “Key/Value” format• Can embed exact matching, operations, nesting, arrays

• Operations: $op (no quotes)• Meaning in SQL: WHERE

• Projection:• “Key/Value” pairs that are given in output• Meaning in SQL: SELECT (no aggregates)

• Example:> db.users.find( {“login” : "james"} , {“name” : 1, “age” : 1});

71 - find queries are simple queries for your report

MongoDB

[email protected]

MQL: find queries (2/2)• Exact match:

> db.users.find( { “login” : "james" } , {“name” : 1, “age” : 1});

• With nesting "." for the path to the given key> db.users.find( { “address.city” : "London" } );

• Operations> db.users.find( { “age” : { $gt : 40 } } );

//$gt, $gte, $lt, $lte, $ne, $in, $nin, $or, $and, $exists, $type, $size, $cond…

• Regular expressions1

> db.users.find( { ”name” : {$regex : “james”, $options : “i" }} );

• Arrays> db.users.find( { ”job” : “MI6”} ); //Check in the list> db.users.find( { ”job.1” : “MI6”} ); //2nd position in the list> db.users.find( { ”job” : [“MI6”]} ); //List exact match

81 - regex : https://docs.mongodb.com/manual/reference/operator/query/regex/

Page 5: MongoDB - Chewbii.com · MongoDB ESILV nicolas.travers@devinci.fr Applications withMongoDB Metlife: unifiedview Cisco: e-commerce Bosch: IoT HSBC: digital transformation The WeatherChannel

MongoDB

[email protected]

MQL: Distinct - Count

• List of distinct values from a key> db.users.distinct( “name” );> db.users.distinct( “address.city” );

• Number of documents> db.users.count();> db.users.find( { “age” : 50 }).count();

9

MongoDB

[email protected]

MQL: pipeline2 (1/3)• aggregate() : Ordered sequence of operators aggregation pipeline

• Command:> db.users.aggregate( [ {$op1 : {}}, {$op2 : {}}, … ] );

• Operators:• ����� : simple queries //meaning in SQL: WHERE

• �������� : projections //meaning in SQL: SELECT

• ������: sort result set //meaning in SQL: ORDER BY

• ������ : normalization in 1NF• �������: group by value + aggregate function // meaning in SQL: GROUP BY + fn

• � ����� : left outer join (since 3.2 – work locally)• �����: store the output in a collection (since 3.2)• �������� : sort by nearest points (lat/long)• …

102 – Pipeline queries (aggregate) are complex queries. For long pipelines, it can be hard queries

Page 6: MongoDB - Chewbii.com · MongoDB ESILV nicolas.travers@devinci.fr Applications withMongoDB Metlife: unifiedview Cisco: e-commerce Bosch: IoT HSBC: digital transformation The WeatherChannel

MongoDB

[email protected]

MQL: pipeline (2/3)• Pipeline : Each step (operator) gives its output to the following operator (input)

> db.users.aggregate([{$match : {"address.city" : "London"}},

{$project : {"login" : 1, "age" : 1}},

{$sort : {"age" : 1, "login" : -1}}

]);

• $unwind

▫ Unnest an array and produce a new document for each item in the list> db.users.aggregate([ {$unwind : "$job"}} ]);

{

"_id": ObjectId("4efa8d2b7d284dad101e4bc7"),

"name": "James Bond", "login": "james", "age": 50,

"address”: {"street": "30 Wellington Square", "city": "London”},

"job" : "agent"}

{

"_id": ObjectId("4efa8d2b7d284dad101e4bc7"),

"name": "James Bond", "login": "james", "age": 50,

"address”: {"street": "30 Wellington Square", "city": "London”},

"job" : "MI6"}

{

"_id": ObjectId("4efa8d2b7d284dad101e4bc7"),

"name": "James Bond", "login": "james", "age": 50,

"address”: {"street": "30 Wellington Square", "city": "London”},

"job" : ["agent", "MI6"]}

11

MongoDB

[email protected]

MQL: pipeline (3/3)• $group : key (_id) + aggregate ($sum / $avg / …)

No grouping key: only one output> db.users.aggregate([ {$group : {"_id" : "age", "res": {$sum : 1}}} ]);

Group by value: $key> db.users.aggregate([ {$group:{"_id" : "$age", "res": {$sum : 1}}} ]);

Apply an aggregate function: $key> db.users.aggregate([{$group:{"_id":"$address.city", "avg": {$avg: "$age"}}} ]);

• Sequence example

> db.users.aggregate([{$match: {"address.city" : "London"} },{$unwind : "$job" },

{$group : {"_id" : "$job", "val": {$avg: "$age"}} },

{$match : {"val" : {$gt : 30}} },

{$sort : { "val" : -1} } ]);

12

Page 7: MongoDB - Chewbii.com · MongoDB ESILV nicolas.travers@devinci.fr Applications withMongoDB Metlife: unifiedview Cisco: e-commerce Bosch: IoT HSBC: digital transformation The WeatherChannel

MongoDB

[email protected]

MQL: updates> db.users.update (

{ “_id” : ObjectId("4efa8d2b7d284dad101e4bc7") },

{ “$inc” : { “age” : 1 } } );

• Atomic updates (one document)▫ $set – modify value▫ $unset – remove a key/value▫ $inc – Increment▫ $push – Add inside an array▫ $pushAll – Add several values▫ $pull – Remove a value in an array▫ $pullAll – Remove several values

UpdateAll – all documents are updated

13

MongoDB

[email protected]

Replica Set• Data replication• Asynchronus• Primary server: writes• Secondary servers: reads

• Fault tolerance• New primary server election• Needs an arbiter

• Consistency vs availability• Reads applied on the primary (by

default)• Updates via oPlog (log file)

15

Page 8: MongoDB - Chewbii.com · MongoDB ESILV nicolas.travers@devinci.fr Applications withMongoDB Metlife: unifiedview Cisco: e-commerce Bosch: IoT HSBC: digital transformation The WeatherChannel

MongoDB

[email protected]

Sharding• Data distribution on a cluster• Data balancing▫ Sharding key choice: begining• 1 Shard = 1 Replica Set• Min config:▫ Config Servers (x3)▫ Mongos (router x3)• Partitioning key (sharding)▫ Ranged-based (GridFS : sort)> sh.shardCollection( "myBD.users", "login" );

▫ Hash-based (md5 sur clé)> sh.shardCollection( "myBD.users", {"_id":"hashed"} )

▫ By zones/tags (+ sharding)16

MongoDB

[email protected]

Cluster management

• OpsManager▫ Shard & ReplicaSet management� Nodes template .yaml

• Cloud Atlas▫ Deployment template▫ Sharding zones

Page 9: MongoDB - Chewbii.com · MongoDB ESILV nicolas.travers@devinci.fr Applications withMongoDB Metlife: unifiedview Cisco: e-commerce Bosch: IoT HSBC: digital transformation The WeatherChannel

MongoDB

[email protected]

Indexing• Create a BTree

> db.users.createIndex( {"age":1} ) ;

▫ All queries on « age » become more efficient� Even for Map/Reduce with « queryParam »� Execution plan « .explain() »

▫ No combination of indexes

18

MongoDB

[email protected]

Indexing: 2DSphere

Apply geolocalized queries> db.users.ensureIndex( { "address.location" : "2dsphere" } );

• Localization format"location":{"type": "Point", "coordinates" : [51.489220, -0.162866]}

• Spherical queriesvar near = {$near: {$geometry:{"type":"Point","coordinates":[51.489220,-0.162866]},

$maxDistance : 10000}};db.users.find( {"address.location":near}, {"name":1, "_id":0});

• $geoNear operator (aggregate)var geoNear = {"near":{"type":"Point", "coordinates" : [51.489220,-0.162866]},

"maxDistance":10000, "distanceField" : "outputDistance", "spherical":true}; db.users.aggregate([ {"$geoNear": geoNear} ]);

• « polygonales » queriesvar polygon = {$geoWithin : {$geometry : {"type" : "Polygon", "coordinates" : [ [P1], [P2], [P3], [P1] }}}

19http://docs.mongodb.org/manual/tutorial/query-a-2d-index/

Page 10: MongoDB - Chewbii.com · MongoDB ESILV nicolas.travers@devinci.fr Applications withMongoDB Metlife: unifiedview Cisco: e-commerce Bosch: IoT HSBC: digital transformation The WeatherChannel

MongoDB

[email protected]

Storage Engine• Wired Tiger

▫ By default since 3.2▫ Created for BerkeleyDB, used for Oracle NoSQL▫ Concurrency purpose: multi-version

• MMAPV1▫ Original engine▫ Mapping in main memory▫ Lock on collections (long writes)

• Encrypted storage▫ Since 3.2▫ Based on Wired Tiger▫ KMIP (Key Management Interoperability Protocol)

• In Memory▫ Since 3.2.6▫ Based on Wired Tiger▫ Must fit in main memory (error : WT_CACHE_FULL)

20

MongoDB

[email protected]

ACID Transactions• Data versioning▫ Causal Consistency▫ "Snapshots" of documents (per transaction)

� Updates on snapshots and try to apply it▫ Timeout: 60s, oplog max: 16MB• ACIS Transactions in a ReplicaSet (v4.0)▫ Maybe in sharding mode in v4.2• Commands:

with client.start_session() as s:s.start_transaction()collection_one.insert_one(d1, session=s)collection_one.insert_one(d2, session=s)s.commit_transaction()

22

Page 11: MongoDB - Chewbii.com · MongoDB ESILV nicolas.travers@devinci.fr Applications withMongoDB Metlife: unifiedview Cisco: e-commerce Bosch: IoT HSBC: digital transformation The WeatherChannel

MongoDB

[email protected]

Map/Reduce (1/2)Language : Javascript

var mapFunction = function () {if( this.age > 30 && this.job.contains(“MI6”) )

emit (this.address.city, this.age);} //emit : key, value, this > current document

var reduceFunction = function (key, values) {return Array.sum(values);

} //returns a unique value aggregated from the list of "values" (for a given key)

var queryParam = {query : {}, out : "result_set"}//query: filter before applying the map, $match//out: output collection

db.users.mapReduce (mapFunction, reduceFunction, queryParam);db.result_set.find();

23

MongoDB

[email protected]

Map/Reduce (2/2)• Map

▫ Several « emit » per map (key/value)▫ Applied on every documents

� Unless: index + queryParam

• Reduce

▫ Optimization : local and global▫ No reduce if only one « value »ØHeterogeneous results/computations

• Beware of non associative functions (averages, lists, etc.)

• Shuffle

▫ Group values by keys (from « emit »)▫ Optimize on sharding key (by default _id)

24

Page 12: MongoDB - Chewbii.com · MongoDB ESILV nicolas.travers@devinci.fr Applications withMongoDB Metlife: unifiedview Cisco: e-commerce Bosch: IoT HSBC: digital transformation The WeatherChannel

MongoDB

[email protected]

Reduce: Average function (1/2)

25

(a,3) (b,2) (a,5) (c,2) (c,4) (a,2) (a,5) (a,1)chunks

LocalReduce

(a,[5,3])(b,[2])

Shuffle

GlobalReduce

(a,[4,2,3]) (b,[2]) (c,[3,1])

(a,3) (b,2) (c,2)

(a,4)

(c,[2,4]) (a,[2]) (a,[5,1])

(b,2) (c,3) (a,2) (a,3)

(c,1)

(c,[1])

(c,1)

Normaly: (a, 3.2), (b, 2), (c, 2.33)

MongoDB

[email protected]

Reduce: Average function (2/2)• Non associative function computation▫ Transform Map & Reduce values in associative operations

� Avg = sum / nbØProduce sum (sums of sum values) and nb (sums of nb values)Ø{"avg" : sum/nb, "sum" : sum, "nb" : nb}

• Same structure for values in map & reduce

26

Page 13: MongoDB - Chewbii.com · MongoDB ESILV nicolas.travers@devinci.fr Applications withMongoDB Metlife: unifiedview Cisco: e-commerce Bosch: IoT HSBC: digital transformation The WeatherChannel

MongoDB

[email protected]

MongoDB Softwares

• Platform Cloud Atlas▫ PaaS

• MongoDB Charts

28

MongoDB

[email protected]

Drivers / API

• Drivers: http://docs.mongodb.org/ecosystem/drivers/

▫ Python, Ruby, Java, Javascript (Node.js), C++, C#, PHP, Perl, Scala…▫ Syntax: http://docs.mongodb.org/ecosystem/drivers/syntax-table/

29

Page 14: MongoDB - Chewbii.com · MongoDB ESILV nicolas.travers@devinci.fr Applications withMongoDB Metlife: unifiedview Cisco: e-commerce Bosch: IoT HSBC: digital transformation The WeatherChannel

MongoDB

[email protected]

Driver Java Connection

ServerAddress server = new ServerAddress ("localhost", 27017);Mongo mongo = new Mongo(server);

Environment (DB + collection)

DB db = mongoClient.getDB( "maBD" );DBCollection users = db.getCollection("users");

Authentification

db.authenticate(login, passwd.toCharArray());Document inserting (DBObject)

DBObject doc = new BasicDBObject("name", "MongoDB").append("type", "database").append("count", 1).append("info", new BasicDBObject("x",

203).append("y", 102));// or doc = (DBObject) JSON.parse(jsonText) ;

users.insert(doc);

Querying (pattern query + cursor)

BasicDBObject query = new BasicDBObject("name", "James Bond");

DBCursor cursor = coll.find(query);try {

while(cursor.hasNext()) {System.out.println(cursor.next());

}} finally {

cursor.close();}

Map/Reduce :

MapReduceCommand cmd = new MapReduceCommand("users", map,

reduce, "outputColl", MapReduceCommand. OutputType.REPLACE, query);

MapReduceOutput out = users.mapReduce(cmd);for (DBObject o : out.results()) {

System.out.println(o.toString());}//outputType : INLINE, REPLACE, MERGE, REDUCE

30http://api.mongodb.org/java/current/index.html?_ga=1.177444515.761372013.1398850293

MongoDB

[email protected]

Driver C#References/Libraries

MongoDB.Bson.dllusing MongoDB.Bson;

MongoDB.Driver.dllusing MongoDB.Driver;

Connection

var connectionString = "mongodb://localhost";var client = new MongoClient(connectionString);var server = client.GetServer();

Database & Collection

var db = server.GetDatabase("maBD");var coll = db.GetCollection<User>("users");

Objet « User » to define

public class User{public ObjectId Id { get; set; }public string name { get; set; }public string login { get; set; }public int age { get; set; }public Address address { get; set; }

}

Get a document :

var query=Query<User>.EQ(e=>e.login,"james");var entity = coll.FindOne(query);

Package for output results: LINQ

using MongoDB.Driver.Linq;Queries:

var query = coll.AsQueryable<User>().Where(e => e.age > 40).OrderBy(c => c.name);

Answor:

foreach (var user in query){

// traitement}

MapReduce :

var mr = coll.MapReduce(map, reduce);foreach (var document in mr.GetResults()) {

Console.WriteLine(document.ToJson());}

http://www.nuget.org/packages/mongocsharpdriver/

31

Page 15: MongoDB - Chewbii.com · MongoDB ESILV nicolas.travers@devinci.fr Applications withMongoDB Metlife: unifiedview Cisco: e-commerce Bosch: IoT HSBC: digital transformation The WeatherChannel

MongoDB

[email protected]

Driver PythonImportation

>>> import pymongoConnexion

>>> from pymongo import MongoClient>>> client = MongoClient()>>> client = MongoClient('localhost', 27017)

Database & collection

>>> db = client.maBD>>> coll = db.users

GET one document

coll.find_one ( { "login" : "james"} )Query

>>> for c in coll.find ( {"age": 40} ) :… pprint.pprint (c)

MapReduce

>>> from bson.code import Code>>> map = Code("function () {" ... " this.tags.forEach(function(z) {" ... " emit(z, 1);" ... " });" ... "}")>>> reduce = Code("function (key, values) {" ... " var total = 0;" ... " for (var i = 0; i < values.length; i++) {" ... " total += values[i];}" ... " return total; } ")>>> result = coll.map_reduce(map, reduce,

"myresults")>>> for doc in myresults.find(): ... print doc

32