Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
-
Upload
brocade -
Category
Technology
-
view
341 -
download
3
Transcript of Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
StackStorm
If-This-Than-That for DevOpsautomation
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.
@StackStorm@Stack_Storm
Agenda
What is StackStorm
Why we made it
Use cases
Why is it better than …
StackStorm: What is it?
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.
StackStorm is like …
7
for DevOps
IFTTT,for DevOps
8
cat /opt/stackstorm/packs/st2-demos/rules/demo.yaml---
name: "sensu_crit_to_slack"pack: "st2-demos"description: "Post all critical alerts to the demoenabled: truetrigger:
type: "sensu.event_handler"criteria:
trigger.check.status:pattern: 2type: "equals"
trigger.check.name:pattern: "demo_.*"type: "matchregex"
action:ref: "slack.post_message"parameters:
message: >[ALERT]{{trigger.client.name}}{{trigger.check.output}}
channel: "#demos"
9
10
Trigger Action
Rule
Ingredients
11
IT Domains
Config mgmtStorageNetworking ContainersCloud InfraMonitoring
ActionsSensors
WorkflowsRules
Ops Support
Trigger Call
Automation Example
12
Automation
EngineerService
Monitoring IncidentManagement
Event: “low disk on web301”
Web301 is “low disk”
Resolve known cases, fast. Is it
/var/log? Clean up!
Unknown problem, need
a human
Wake up, buddy. Something real
is going on…
StackStorm: Why we made it?
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.
Opalis, now Microsoft SC Orchestrator2004-2008
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.
Event-driven automation with workflows,for Enterprise IT
VS
VMware, 2008 - 2013
VS
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 16
OpenStack VMWare
DevOps Tools
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 17
VS
Enterprise Suites
VS
18
Event-driven automation with workflows,for Cloud and DevOps
DevOps Tools Enterprise Suites
19
What can be it automated?
20
Got StackStorm !
Every automation looks like a nail
What SHOULD be automated?From: Practice of Cloud System Administration, by Thomas Limoncelli
What HAS BEEN automated with StackStorm
• Security checks
– On malware detection in a VM, isolate network port on a switch
• App blue-green deployment
– On Jenkins tests passed, bring new vmclaster, deploy and configure app, set loadbalancer to send % of traffic to new app, monitor, roll forward, or back out
• Networking
– On BGP peer goes down: collect troubleshooting data, post on slack & create JIRA ticket
– On Link aggregation member error, check load, if capacity of rest of LAG bundle enough, disable link with error
• OpenStack– orphan VM clean-up: On orphans
detected, shut down, email owner, keep for few days, delete
– VM evacuation on HW failures: On host RAID failure, get list of impacted VMs, email VM owners, evacuate VMs, create JIRA ticket for hardware replacement.
• Service remediation:– Cassandra “node down” recovery: On ring
node dying, deploy new node, configure, add to the ring.
– Remediating RabbitMQ, Galera cluster, MySQL, and more…
22
23
Auto-Remediation
FB auto-remediates 98% alarms,can you?
“Auto-Remediation & Automation at Facebook” @ Auto-Remediation Meetup SFhttps://www.meetup.com/Auto-Remediation-and-Event-Driven-Automation/events/236704012/
Facebook FBAR:43 % Problem Fixed51 % False positives94 % Automated
“Auto-Remediation & Automation at Facebook” @ Auto-Remediation Meetup SFhttps://www.meetup.com/Auto-Remediation-and-Event-Driven-Automation/events/236704012/
FBAR & StackStorm are friendsStackStorm is inspired by FBARStackStorm and FB FBAR collaborating since 2014
“Sleep Better at Night: OpenStack Cloud Auto-Healing” @ OpenStack Summit Barcelona
Mirantis: Auto-remediating 2,000 node OpenStack cluster at Symantec with StackStorm
“Sleep Better at Night: OpenStack Cloud Auto-Healing” @ OpenStack Summit Barcelona
Mirantis: Auto-remediating 2,000 node OpenStack cluster at Symantec with StackStorm
User: Symantec (Mirantis)Use case: OpenStack cluster remediationPresented by Mirantis at OpenStack Barcelona
StackStorm at Symantec
Engineer Wakes up
Logs in and ACK
Checksrunbook
Studiesthe alert
Fixes theproblem
Runs diagnostics
PagerDuty
Alert
2:02 AM 2:07 AM 2:15 AM2:10 AM 2:30 AM2:20 AM2:00 AM
On-call, Without Automation
Source: “Winston: Helping Netflix Engineers Sleep at Night” @ Qcon ‘16 SFhttps://goo.gl/lHzq4r
FalsePositive
Winston
2:00 AM
2:05 AM
2:05 AM
2:15 AM
AssistedDiagnostics
Fixed theproblem
On-call With Winston
Source: “Winston: Helping Netflix Engineers Sleep at Night” @ Qcon ‘16 SFhttps://goo.gl/lHzq4r
Benefits
• Reduce MTR (Mean Time to Resolution)
• Avoid failures (fixing on computer time, not human time)
• Reduce risk of human error (no fat fingers)
• Positive team impact
– Avoid pager fatigue and team burn-out
– Turn from reactive to proactive (break reactive vicious cycle)
– Capture operational knowledge – as code
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 30
31
Network Automation
Now supporting Multiple Vendors
Device supportProven approach from cloud compute space
33
NAPALM
AUTOMATION
Packs
PYTHON
BINDINGS
INTEGRATION
PACKS
DEVICE / OS
INTERFACES
Some workflows [1] leverage unique device functions, so they call actions of the device’s integration pack. Other workflows [2] need to be abstracted and treat devices alike (e.g. IXP provision on mixture of SLX and MLX). So they use “unified” Napalm pack.
device drivers
Integration packs expose device configurations and operations as st2 actions. VDX and SLX packs will expose most/all of device functionality. MLX is “best effort”. Napalm integration pack provide “unified” device actions, just like libcloud does in “compute” domain.
While integration packs can call device interfaces directly, python bindings provide reuse, and abstract device OS versions. PyNOS binds VDX via netconf. SLX and MLX don’t exist yet.Napalm is Open source project that exposes a unified python API to interact with different network devices via device drivers.
Devices expose various interfaces: RESTCONF, NETCONF, CLI/TELNET, SNMP.
VDX MLX SLX
[1] [2]
Other vendors
34
Including legacy, business apps …
Integration
“Innovation at Dimension Data: Optimizing Operations with Event Driven Automation” https://stackstorm.com/2016/12/15/dimension-data-devops-beyond-deployment/
Dimension Data (SP, part of NTT)
“Innovation at Dimension Data: Optimizing Operations with Event Driven Automation” https://stackstorm.com/2016/12/15/dimension-data-devops-beyond-deployment/
• Integrate IT systems & tools• Security automation• Legacy Run-book replacement• Automation-aaS to end-users• Top st2 contributors
Dimension Data (SP, part of NTT)
38
IoTFun stuff
40
Serverless
Grab StackStorm & DIY
StackStorm is like …
41
AWS Lambda AWS Step Functions
OpenSource, for DIY Serverless
StackStorm is like …
42
ActionsSensors
WorkflowsRules
IT Domains
Config mgmtStorageNetworking ContainersCloud InfraMonitoring Ops Support
Step Functions
AWS Lambda
Serverless with Swarmfor Genomic Annotation ComputingDmitri Ziminehttp://github.com/dzimine/serverless@dzimine
Image by Miki Yoshihito, Creative Commons license
Many Use Cases – One Platform
44
StackStorm automation platform
Ne
two
rk
Au
tom
atio
n
Ass
iste
dT
rou
ble
sho
otin
g
Au
toR
em
ed
iatio
n
IT p
roce
ss
inte
gra
tio
n
IoT
Inte
rne
t o
f T
hin
gs
Se
rve
rle
ss
CI/
CD
Co
ntin
uo
us
De
plo
ym
en
t
NF
V
Se
curi
tyO
rch
est
ratio
n
Ch
atO
ps
Why not…
45
Why not Scripts?
46
Why not Scripts?
47
• Simple to define, reason, visualize
• Transparent
– state is clear, execution is trackable: running, complete, failed steps
•
–
–
–
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.48
Workflows Better in Operations
• Simple to define, reason, visualize
• Transparent
– state is clear, execution is trackable: running, complete, failed steps
• Reliable
– Workflows are long-running
– Crash tolerance
– “Restart from point of failure”
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 49
50
Why not legacy Runbook Automation?
• Microsoft System Center Orchestrator
• HP Operation Orchestrator
• Cisco Process Orchestrator (CPO)
• VMWare vCO / vRealize
They do not DevOps
Infrastructure As Code
Leverage social coding and collaboration
OpenSource
Designed for Devops
Infrastructure As Code
Leverage social coding and collaboration
OpenSource
Designed for Devops
53
Infrastructure as code
Case Study
• Service Catalog backed up by workflow
• Automate provisioning on VMW/OpenStack, 4 Data centers
• Before: CPO, operator updates via GUI, click and pray, x4
• After: StackStorm, dev -> code review -> staging -> QA-> prod
Infrastructure As Code
Leverage social coding and collaboration
OpenSource
Designed for Devops
265 ContributorsSource: https://openhub.net/p/st2
256 = 100,000,0002
Contributors
StackStorm & BWC Usage
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 57
0
500
1000
1500
2000
2500
3000
Sep Oct Nov Dec Jan Feb
Installations/month- StackStorm- BWC
StackStorm Exchange
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 58
StackStorm Exchange
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 59
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 60
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 61
Take away:* Try it* Use it* Contribute to it
62
63
• Use StackStorm.
Try it, find automation, nail POC. Let us know, good & bad.
curl -sSL https://stackstorm.com/packages/install.sh | bash -s
docs.stackstorm.com/install
• Commit code. Become a “community maintainer”It is not hard (2 days?). We help & support.
• Spread the word
Blog. Tweet. Talk. Mention. Bug. Github Star!
64
Contribute! Everything counts
Thank You!
@Stack_Stormhttp://github.com/StackStorm/st2 Star 1,869
Dmitri Zimine@dzimine
OpenSource Apache 2.0
• Github: github.com/StackStorm/st2
• Twitter: Stack_Storm
• IRC: #stackstorm on FreeNode
• stackstorm.slack.com on Slack
• www.stackstorm.com
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 66
StackStorm Brocade Workflow Composer
Commercial Edition
• Enterprise features
• Priority support
• brocade.com/bwc
• docs: bwc-docs.brocade.com
• Network lifecycle automation suite