Making operations visible - devopsdays tokyo 2013

Post on 08-Sep-2014

9.190 views 0 download

Tags:

description

 

Transcript of Making operations visible - devopsdays tokyo 2013

Making Operations Visible

Nick Galbreath  ニック ガルブレス

DevOpsDays Tokyo 2013

#devopsdays

@ngalbreath

nickg@client9.com ngalbreath@iponweb.net

http://slidesha.re/1h9Aqyehttp://www.client9.com/

It's also on video!

http://bit.ly/1gaEmDS

Nick Galbreath http://client9.com/20130501 @ngalbreath

Who is nickg?

www.client9.com

Online Advertising Infrastructureオンライン広告 インフラ

http://www.iponweb.jp/

ロシヤ モスクワ東京

Continuous Deployment

• In 2012, I spoke many times on continuous deployment.

• But changing from release cycles to continuous deployment is too big a change for most organization, and they don't have the tools to do it.

Goal

• I'm hoping that adding new metrics to the application becomes so addictive that you'll want to shorten release cycles.

What is DevOps?

• Puppet, Chef, Annsible?

• GitHub? AWS? The Cloud?

• Continuous Deployment?

Yes, but these are tools. Great tools.

It's About Communication

• Between machines

• Between team members

• Between Dev and Ops

But in many companies there is a bigger problem

You're Invisible• If you are in Business, you are invisible to Development and Tech Operations

• If you are in Operations, you are invisible to Business and Development

• If you are in Development, you are invisible to Business and Operations.

Invisible ThingsAren't Valued

Developer

• "I don't know what my code will do in production and ops and let's them deal with it.

• "Why doesn't ops fix these problems."

• "What does Ops do all day?"

Business

• Why do I have to wait till end of the month for a report?

• "Did the last weeks release change anything?"

• "What don't they understand the impact of that bug, outage, etc?"

Operations

• Why are they always bothering me.

• I've got work to do!

• Why do we have do another release again... can't developers do a better job?

• "What does this company do?" (really)

This is really destructive

To youTo your TeamTo your company.

All of This Can Fixed By Making

Operations Visible with data

Not just technical operations but company operations.

So Why Not Expose This Data?

Here's a list of excuses I've heard

Your company is full of data!

"But I already have graphing in my alerting system"

• Maybe. But it's junk

• Can't share

• Can't do data mash-ups

• Can't do data transformations

"They wouldn't understand."

• "They won't understand the data so what's the point of sharing it."

• First, "they" probably do. And more people looking at ops metrics, the better.

• Us vs. Them = Fail.

"They might break something."

• "The data is in our alerting system, we don't want you to break it."

• Assumes "they" are incompetent, or malicious. Learn to trust.

"It's not your job, so you don't need to

know.""That information isn't

important"• This excuse is typically caused by fear.

• Why are you deciding what's important?

"I'm not making another system,

duplicating data is bad."

• For operational metrics is very ok to have a redundant copy of data.

• Completely different goals.

• Use as alerting-beta

"I'm too busy.""It's too dangerous""I don't know how."

• These are real problems.

• So let's fix it!

One Machine, One Day,

One PersonChallenge!

Let's get 100% of operational metrics in, and enable the application to make and share new metrics on demand without any help from you.

Graphite•https://github.com/graphite-project

• http://graphite.readthedocs.org/

• Similar to RRDTool, Ganglia, Cacti

• Uses specialized data storage

• Uses specialized queries

• Optimized for time series

Graphite isn't Perfect

• Documentation isn't great (but getting better)

• A few QA issues

• Somewhat odd stack (python-twisted, django)

Graphite Ecosystem

• Flexible input and output

• REST API for graphs

• Simple UI for mashups and dashboards

• 3rd party, custom, client-side dashboards

Makes Sharing Easy

• Do you have an interesting graph? It's just a URL!

• Dashboards are easy since graphs are just URLs. Very easy to make HTML dashboards.

One MachineOne Day!

• A single low-end machine should have capacity for a few thousand metrics per minute from 50+ machines.

• Graphite is not CPU intensive, but needs fast disks and/or more memory.

One Day, One Person

• Graphite is not hard to install, but it is a bit messy.

• But might be as easy as "apt-get install graphite" on your system.

• It would be good to have a workshop or prebuilt AMI for EC2

• But not today :-(

Operational Stats

• You could parse /proc, ps, df, netstat, etc and write your own custom scripts....

• ...or use Diamond from BrightCove

•https://github.com/BrightcoveOS/Diamond

Metrics in Diamond now

• Apache

• NGINX

• MySQL

• SNMP

• Memory

• CPU

• Disk

• Networkand many more

But what about the your applications?

And business metrics?

100% of pure operational metrics are now shared!

Enter StatsD•https://github.com/etsy/statsd

• Your application sends event data to statsd, as it happens, in real-time.

• StatsD collects this data and computes time-series metrics (sum, min, max, average)

• Once a minute, it writes data to Graphite

The Magic of UDP

• Your application sends metrics in a UDP packet.

• UDP is error-free. No exceptions, No timeouts. It can not cause your application to crash

• It will not overload your network.

• You may lose metrics, but in an intranet, it's rare.

Let's Count Logins!

• Most StatsD client APIs are one-file, no C, simple.

• Add one line to your login code.

StatsD::increment('logins');

• That's it!

Events!• You can also graph low-frequency

events.

• Just send another StatsD request in your batch scriptStatsD::increment("deploy", 1);

• Do it on reboots, installs, core dumps.

• New bugs, new hires, new code commits.

• Use drawAsInfinite to display

Server Server Server

StatsD

Graphite

login,1login,1 login,1

(login,3), (deploy,1)Deploy Script

deploy,1

Measure Anything, Measure Everything http://codeascraft.com/2011/02/15/measure-anything-measure-everything/

Logins By Country!

• get country code from IP address

• make a new metric "login_country" instantly

StatsD::increment('logins');$kuni = geoip2country($ipv4);StatsD::increment('logins.$kuni');

Make Dashboards

• and make frameworks to make new dashboards, easy.

Default DashboardGood for experiments

Dashboards

Make it easy for your customers

Make Operations

Visible

• Make the company visible.

• Enable communication

• Do the One Machine, One Day, One Person Challenge!

Thanks!

• The entire event is http://vimeo.com/album/2559722

DevOpsDays Tokyo 2013

DevOpsDays Tokyo 2013is on video!

http://vimeo.com/album/2559722

DevOpsDays Tokyo 2013

• http://itpro.nikkeibp.co.jp/article/NEWS/20130930/507682/

• http://itpro.nikkeibp.co.jp/article/NEWS/20130930/507755/

• http://itpro.nikkeibp.co.jp/article/NEWS/20131001/507959/

• http://www.publickey1.jp/blog/13/devopsdevops_day_tokyo_2013.html

• http://www.publickey1.jp/blog/13/devopsdevops_day_tokyo_2013_1.html

• http://www.publickey1.jp/blog/13/githubdevopsboxenhubotdevops_day_tokyo_2013.html

• http://www.publickey1.jp/blog/13/githubboxenhubotdevops_day_tokyo_2013.html

• http://www.publickey1.jp/blog/13/devopsdevops_day_tokyo_2013_2.html

Media Coverage

• http://mass.hatenablog.com/entry/2013/09/28/205309

• http://d.hatena.ne.jp/n-sega/20130928/1380373634

• http://kazuph.hateblo.jp/entry/2013/09/28/152302

• http://jedipunkz.github.io/blog/2013/09/29/devops-day-tokyo-2013-report/

• http://toshi-miura.hatenablog.com/entry/2013/09/29/222609

• http://lewuathe.github.io/blog/2013/09/28/devopsday-tokyo-2013nixing-tutekitayo/

• http://codezine.jp/article/detail/7438

DevOpsDays Tokyo 2013

Attendee Coverage