Presentazione VMware @ VMUGIT UserCon 2015
-
Upload
vmug-it -
Category
Technology
-
view
533 -
download
2
Transcript of Presentazione VMware @ VMUGIT UserCon 2015
© 2014 VMware Inc. All rights reserved.
Zero Downtime Application Mobility with Site Recovery Manager
Lee Dilworth @leedilworth
Cross-site Availability – Typical choices today
2
Two sites, treat as one = stretched cluster
Two sites, treat as two = Disaster Recovery
Site A (Active)
Site B (Active)
Stretched Storage Site A (Active)
Site B (Passive)
Replicated Storage
VCENTER SRM VCENTER SRM VCENTER
Active-Active datacenters – use cases
• Planned maintenance of one site without any service downtime
• Transparent to app owners and end users
• Avoid lengthy approval processes
• Ability to migrate applications back after maintenance is complete
Planned Maintenance
• Automated initiation of VM restart or recovery
• Very low RTO for majority of unplanned failures
• Allows users to focus on app health after recovery, not how to recover VMs
Automated Recovery
• Prevent service outages before an impending disaster (e.g. hurricane, rising flood levels)
• Avoid downtime, not recover from it
• Zero data loss possible if you have the time
Disaster Avoidance
What you need for an Active-Active datacenter model
Stretched Storage solution
– Storage clustering solution that supports
distributed data mirroring
– Read/write access to the same volumes
from both sites
– Some tie-break mechanism to avoid split-
brain
– Examples: EMC VPLEX, IBM SVC, NetApp
MetroCluster, etc.
– Stretched Network
Backend Array (Site 1)
Backend Array (Site 2)
Storage
controllers
Storage
controllers
Active-active datacenter network model
5
Multi-Site Single vC with Stretched Clusters
Stretched Storage
Web App DB
Web App DB
Web App DB
N-S Connectivity N-S Connectivity
Active-Passive datacenters – use cases
• Recover from unexpected site failure (full or partial)
• Most common use case
• Fast and accurate recovery usually critical to customers
• Workflow driven
• High degree of confidence if regular test failovers have been performed
Unplanned Failover
• Planned datacenter maintenance
• Global load balancing or distribution of service
• Using test feature to minimize risk
• Execute partial failovers
• Automated failback enables bi-directional migrations
Planned Migration
• Anticipate potential datacenter outages
• Initiate preventative failover for smooth migration of services
• Graceful shutdown of services to be migrated, zero data loss
Preventative Failover
What you need for an Active-passive datacenter model
Replicated Storage solution
– Storage or software based replication
configured between sites
– vCenter per site
– SRM server per site
– Network can be stretched or not
– Concept is referred to as Active-passive;
reality is each site is active simply acts as
the passive DR location for it counterpart
Site A (Active)
Site B (Passive)
Replicated Storage
VCENTER SRM VCENTER SRM
NSX 6.2 Integration with SRM 6.1
8
Implicit Mapping
Distributed Switch
Distributed Switch
SRM B SRM A
NSX Universal Logical Switch
Cake and eat it
• Most common requests
– Use both stretched and non-stretched storage in same design
– Leverage operational benefits of SRM for stretched storage
– Use SRM to drive large scale migrations where needed on stretched solutions
• Can this be done?
– Prior to vSphere 6.0 the answer was NO, its one or the other
– Reaction from customers was usually this…….
9
• Support introduced in vSphere 6.0
• Requires vCenter & ESXi 6.0 or later
• Simultaneously changes
– Compute
– Storage
– Network
– vCenter
• vMotion without shared storage
• Increased scale
– Pool resources across vCenter servers
What has changed?– vMotion anywhere!
10
VCENTER
VMware vSphere
Stretched Networks
VCENTER
VMware vSphere
This cross vCenter vmotion layout seems familiar….
11
Cross vCenter vMotion layout SRM layout
Site A (Active)
Site B (Active)
Stretched Storage Site A (Active)
Site B (Passive)
Replicated Storage
VCENTER SRM VCENTER SRM VCENTER SRM VCENTER SRM
So what can be done now?
12
VCENTER SRM VCENTER SRM
Replicated Storage
Stretched Storage
vSphere
Replication
Why customers ask for SRM integration with stretched clusters
13
• vCenter Availability
– Failure of the site where vCenter is running disrupts management of both sites
• Operational Watchdogs
– Availability specific alarms, alerts and events
– Configuration validation on the fly
• DRS and HA are not site aware
– VMs are recovered and migrated to any site – may not be what you want !
– Could result in additional East-West traffic when your network is not designed to handle it
• No Orchestration or Testability
– Stretched Clusters lack a repeatable, testable procedure to handle unplanned failures
– HA will restart VMs based on VM restart order – but doesn’t give you granular control of VM dependencies or customization
active/active datacenters – a new approach
active-active datacenters with SRM 6.1
VCENTER SRM VCENTER SRM
VMware vSphere VMware vSphere
Volume A at Site 1 (Full R/W access)
Volume A at Site 2 (Full R/W access)
Stretched Networks
scenario 1: local host failures in one site
VCENTER SRM VCENTER SRM
VMware vSphere VMware vSphere
Volume A at Site 1 (Full R/W access)
Volume A at Site 2 (Full R/W access)
Stretched Networks HA handles local failures
scenario 2: disaster avoidance at one site
VCENTER SRM VCENTER SRM
VMware vSphere VMware vSphere
Volume A at Site 1 (Full R/W access)
Volume A at Site 2 (Full R/W access)
Stretched Networks
SRM invokes vMotion as per VM priority and
dependencies
scenario 3: faster recovery from unplanned failures
VCENTER SRM VCENTER SRM
VMware vSphere VMware vSphere
Volume A at Site 1 (Full R/W access)
Volume A at Site 2 (Full R/W access)
Stretched Networks
SRM orchestrates entire site failover including
dependencies
Technical Deep-dive and Demo
SRM with stretched storage: Initial setup
20
• New SRA interface
– Contact your array vendor for the SRA availability and supported array models
– Most vendors will provide a single SRA to manage both stretched and non-stretched volumes
– Existing SRAs (with no stretched storage support) will continue to work as is with new SRM
• Configure stretched storage volumes using the array UI/tools
• SRM will discover stretched arrays/volumes through the SRA
– Use the SRM UI to verify stretched volumes and how they map to datastores
– Use the SRM UI to verify the site preference for stretched volumes
SRM
SRA
Vendor
Management
Interface
Array
Manager Array
Manager
Replication Manager
SRA
Vendor
Management
Interface
Stretched
Storage
Non
Stretched
Non
Stretched
SRM with stretched storage: Initial setup
21
Introducing - Storage Policy Protection Groups (SPPG)
22
Profile Driven
Protection Group
• New Style Protection Group leveraging storage profiles (SRM 6.1)
• Level of indirection and automation compared to traditional protection groups
• Policy based approach reduces OpEx by handling VM protection lifecycle automatically
• Simpler integration of VM provisioning, migration, and decommissioning with other solutions such as vRealize Automation
Storage Policy
SRM with stretched storage: configuring protection
23
• Stretched storage supported ONLY with Storage-Profile based Protection Groups (SPPG)
• Configure storage profiles for stretched volumes at each site
• Configure protection groups for the storage profiles
• SRM will automatically protect all VMs assigned to the storage profiles in the group
• Can mix stretched and non-stretched storage in the same protection group
• Protection groups with stretched devices MUST have a preferred direction
• Preferred direction MUST match any site preference defined at the storage layer
• Cannot create two groups in opposing directions using same devices
SRM with stretched storage: configuring recovery
24
• Create one or more recovery plans for all or some protection groups
– Same protection group can belong to multiple recovery plans
– Can mix stretched, non-stretched storage (only if they are SPPG group types)
• Configure recovery settings for each VM
– Stretched VMs** can depend on non-stretched VMs and vice versa
– Can configure IP customization for stretched VMs that do not have stretched networks (for DR/Test)
– Can assign scripts to stretched VMs
– Can opt out of vMotion for some VMs even if they reside on stretched storage
** - Stretched VMs refers to VMs on Stretched Storage datastores
SRM with stretched storage: test failover
25
• Use test failover to make sure an unplanned failover would succeed
• Stretched VMs are included in the test failover
• Stretched VMs are powered on in an isolated network
• SRM will perform vMotion host compatibility tests as part of the test failover
• SRM will not perform vMotion as part of the test failover
• Complicated environmental issues (i.e. network latency) may remain undetected
• Test failover for stretched storage requires array support
– Not all arrays support snapshotting of stretched devices
– Contact your array vendor for specific compatibility requirements
VCENTER SRM VCENTER SRM
test
SRM with stretched storage: planned migration
26
• Use planned failover to avoid an expected outage
• Choose whether to use vMotion when initiating a planned migration
– Enabled, SRM uses vMotion on stretched
– Disabled, SRM power off / power on as normal
• SRM will reassign site preference to recovery site for all stretched volumes (if array supports it)
• SRM will perform a storage sync to make sure no blocks are left at the protected site
• Planned failover with mix of stretched and non-stretched VMs is ok
Site A (Active)
Site B (Passive)
Stretched Storage
VCENTER SRM VCENTER SRM
SRM with stretched storage: unplanned failover
27
• Use unplanned failover to recover from a disaster
– Initiated when the protected site is no longer functional
• Stretched VMs are powered on at the recovery site
– Stretched and non-stretched VMs are recovered together
– Priority tiers and VM dependencies are honored across all VMs
– SRM will coordinate with the array to guard against any VMs still running at the protected site
• Stretched volumes are recovered faster – shorter RTO
– Stretched volumes are already visible at the recovery site
– No need to wait for costly surfacing, mounting and host rescan operations
SRM with stretched storage: reprotect and failback
28
• Use reprotect to reverse the roles of sites after a successful planned failover
– Repairs replication for stretched devices
– Ensures the new protected site (former recovery site) has the site preference for stretched devices
• Reprotect after an unplanned failover
– Rerun planned failover once the protected site becomes available
• Failback
– Initiate a planned failover after the reprotect to migrate all VMs back to the former protected site
– It is recommended to perform a test failover first to make sure everything is ready
Key Takeaways Roadmap
• SRM is a great solution for Active Active datacenters
• SRM enhances Continuous Availability with rich
orchestration
• SRM with vMotion enables ZERO service downtime for
disaster avoidance
• No longer trade-off testability and repeatability when
choosing Active-Active model
• SRM + Live Migration is a game changer in IT
operations
Thank you
30