Post on 15-Jan-2015
description
Optimizing SlowQueries with Indexes
and Creativity
Chris Winslettchris@mongohq.com
My Background
For the past year, I’ve looked atMongoDB logs at least once every day.
We routinely answer the question “howcan I improve performance?”
•
•
Who’s this talk for?
New to MongoDB
Seeing some slow operations, and needhelp debugging
Running database operations on a sizeabledeploy
I have a MongoDB deployment, and I’vehit a performance wall
••
•
•
What should you learn?Know where to look on a running MongoDBto uncover slowness, and discuss solutions.
MongoDB has performance“patterns”.
How to think about improving performance.
And . . .
Schema Design
Design with the end in mind.
First, a Simple One
query getmore command res faults locked db ar|aw netIn netOut conn time 129 4 7 126m 2 my_db:0.0% 3|0 27k 445k 42 15:36:54 64 4 3 126m 0 my_db:0.0% 5|0 12k 379k 42 15:36:55 65 7 8 126m 0 my_db:0.1% 3|0 15k 230k 42 15:36:56 65 3 3 126m 1 my_db:0.0% 3|0 13k 170k 42 15:36:57 66 1 6 126m 1 my_db:0.0% 0|0 14k 262k 42 15:36:58 32 8 5 126m 0 my_db:0.0% 5|0 5k 445k 42 15:36:59
a truncated mongostat
Alerted due to high CPU
log
Example 1
[conn73454] query my_db.my_collection query: { $query: {publisher: "US Weekly" }, orderby: { publishAt: -1 } }ntoreturn:5 ntoskip:0 nscanned:33236 scanAndOrder:1keyUpdates:0 numYields: 21 locks(micros) r:317266nreturned:5 reslen:3127 178ms
Solution
Example 1
{ $query: { publisher: "US Weekly" }, orderby: { publishedAt: -1 } }
db.my_collection.ensureIndex({“publisher”: 1, publishedAt: -1}, {background: true})
We are fixing this query
With this index
I would show you the logs, but now they are silent.
The Pattern
Example 1
Inefficient Read Queries from in-memory table scans cause high CPU load
Caused by not matching indexes to queries.
Example 2
query delete res faults locked db idx miss % qr|qw ar|aw netIn netOut conn 25 6 346m 0 my_db:188.1% 0 0|0 0|2 25k 45k 117 24 6 346m 0 my_db:188.6% 0 0|0 0|1 27k 44k 117 24 6 346m 0 my_db:184.3% 0 0|0 0|1 21k 36k 117 24 6 346m 0 my_db:190.9% 0 0|0 0|1 20k 33k 117 19 4 346m 0 my_db:191.5% 0 0|0 0|0 21k 41k 117
a truncated mongostat
tail
[conn72593] remove my_db.my_collectionquery: { status: "some chuck of text" }keyUpdates:0 numYields: 15 locks(micros)w:213415 210ms
Example 2
Solution
Example 2
db.my_collection.remove({status: “some chunk of text”})
db.my_collection.ensureIndex({status: 1})
This is the slow query
With this index
The Pattern
Example 2
Inefficient write queries cause high lock.
Caused by losing track of your indexes / queries.
Example 3
insert query update delete getmore command faults locked % idx miss % qr|qw ar|aw *0 *0 *0 *0 0 1|0 1422 0 0 0|0 50|0 *0 6 *0 *0 0 6|0 575 0 0 0|0 51|0 *0 3 *0 *0 0 1|0 1047 0 0 0|0 50|0 *0 2 *0 *0 0 3|0 1660 0 0 0|0 50|0
a truncated mongostat
Alerted on high CPU
tail
[initandlisten] connection accepted from ....[conn4229724] authenticate: { authenticate: ....[initandlisten] connection accepted from ....[conn4229725] authenticate: { authenticate: .....[conn4229717] query ..... 102ms[conn4229725] query ..... 140ms
amazingly quietExample 3
currentOp> db.currentOP(){ "inprog" : [ { "opid" : 66178716, "lockType" : "read", "secs_running" : 760, "op" : "query", "ns" : "my_db.my_collection", "query" : {keywords: $in: [“keyword1”, “keyword2”],tags: $in: [“tags1”, “tags2”] },orderby: {“created_at”: -1}, "numYields" : 21 }]}
Example 3
Solution
> db.currentOP().inprog.filter(function(row) { return row.secs_running > 100 && row.op == "query" }).forEach(function(row) { db.killOp(row.opid) })
Example 3
Return Stability to Database
Disable query, and refactor schema.
Refactoring
I have one word for you,“Schema”
Example 4
A map reduce has gradually runslower and slower.
Finding Offenders
Example 4
Find the time of the slowest query of the day:grep '[0-9]\{3,100\}ms$' $MONGODB_LOG | awk '{print $NF}' | sort -n
Slowest Map Reduce
Example 4
my_db.$cmd command: { mapreduce: "my_collection", map: function() {}, query: { $or: [ { object.type: "this" }, { object.type: "that" } ],time: { $lt: new Date(1359025311290), $gt: new Date(1358420511290) }, object.ver: 1, origin: "tnh" },out: "my_new_collection", reduce: function(keys, vals) { ....}} ntoreturn:1 keyUpdates:0 numYields: 32696 locks(micros) W:143870 r:511858643 w:6279425 reslen:140 421185ms
Solution
Query is slow because it has multiple multi-value operators: $or, $gte, and $lte
Example 4
Problem
SolutionChange schema to use an “hour_created” attribute:
hour_created: “%Y-%m-%d %H”
Create an index on “hour_created” with followed by “$or” values. Queryusing the new “hour_created.”
Words of caution
2 / 4 solutions were to add an index.
New indexes as a solution scales poorly.
Sometimes . . .
It is best to do nothing, except add shards / addhardware.
Go back to the drawing board on the design.
Bad things happen togood databases?
ORMs
Manage your indexes and queries.
Constraints will set you free.
•••
Road Map forRefactoring
Measure, measure, measure.
Find your slowest queries and determineif they can be indexed
Rephrase the problem you are solving byasking “How do I want to query by data?”
••
•
Thank you!
Questions?
E-mail me: chris@mongohq.com
••