© 2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center Section 5 -...

59
© 2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center Section 5 - Introduction

Transcript of © 2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center Section 5 -...

© 2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data CenterMonitoring and Managing the Data Center

Section 5 - Introduction

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 2

本章目标及内容 依赖于存储管理工具的数据监测与管理是本章要讨论的

主要内容。通过对存储的硬件、软件、信息容量、格式、内容等诸多方面的监测,信息可以得到最优化的管理与应用。同时,本章还介绍了一些主要的信息管理软件的基础应用知识。

本章内容包括 2 个方面:5.1 数据中心的监测( Monitoring in the Data Center )5.2 数据中心的管理( Managing in the Data Center )

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 3

Section Objectives

Upon completion of this section, you will be able to:

Describe areas of the data center to monitor

Discuss considerations for monitoring the data center

Describe techniques for managing the data center

© 2006 EMC Corporation. All rights reserved.

Monitoring in the Data CenterMonitoring in the Data Center

Module 5.1

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 5

Monitoring in the Data Center

After completing this module, you will be able to:

Discuss data center areas to monitor

List metrics to monitor for different data center components

Describe the benefits of continuous monitoring

Describe the challenges in implementing a unified and centralized monitoring solution in heterogeneous environments

Describe industry standards for data center monitoring

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 6

Monitoring Data Center Components

HBAHBAPortPort

HBAHBA

Cluster

IPIP

Kee

p A

live

Client

PortPort

Health

Capacity

Performance

Security

Storage Arrays

Hosts/Servers with Applications

SAN

Network

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 7

Why Monitor Data Centers?

Availability– Continuous monitoring ensures availability

– Warnings and errors are fixed proactively

Scalability– Monitoring allows for capacity planning/trend analysis which in turn

helps to scale the data center as the business grows

Alerting– Administrators can be informed of failures and potential failures

– Corrective action can be taken to ensure availability and scalability

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 8

Monitoring Health

Why monitor health of different components?– Failure of any hardware/software component can lead to outage of a

number of different components Example: A failed HBA could cause degraded access to a number of

data devices in a multi-path environment or to loss of data access in a single path environment

Monitoring health is fundamental and is easily understood and interpreted– At the very least health metrics should be monitored

– Typically health issues would need to be addressed on a high priority

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 9

Monitoring Capacity

Why monitor capacity?– Lack of proper capacity planning can lead to data un-availability and

the ability to scale

– Trend reports can be created from all the capacity data Enterprise is well informed of how IT resources are utilized

Capacity monitoring prevents outages before they can occur– More preventive and predictive in nature than health metrics

Based on reports one knows that 90% of a file system is full and that the file system is filling up at a particular rate

95% of all the ports have been utilized in a particular SAN fabric, a new switch should added if more arrays/servers are to be added to the same fabric

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 10

Monitoring Performance

Why monitor Performance metrics?– Want all data center components to work efficiently/optimally

– See if components are pushing performance limits or if they are being under utilized

– Can be used to identify performance bottlenecks

Performance Monitoring/Analysis can be extremely complicated– Dozens of inter-related metrics depending on the component in

question

– Most complicated of the various aspects of monitoring

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 11

Monitoring Security

Why monitor security?– Prevent and track unauthorized access

Accidental or malicious

Enforcing security and monitoring for security breaches is a top priority for all businesses

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 12

Monitoring Servers

Health– Hardware components

HBA, NIC, graphic card, internal disk …

– Status of various processes/applications

Capacity– File system utilization

– Database Table space/log space utilization

– User quota

HBAHBA

HBAHBA

Server

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 13

Monitoring Servers

Performance– CPU utilization

– Memory utilization

– Transaction response times

Security– Login

– Authorization

– Physical security Data center access

HBAHBA

HBAHBA

Server

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 14

Monitoring the SAN

Health– Fabrics

Fabric errors, zoning errors

– Ports Failed GBIC, status/attribute change

– Devices Status/attribute Change

– Hardware Components Processor cards, fans, power supplies

Capacity– ISL utilization

– Aggregate switch utilization

– Port utilization

SAN

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 15

Performance– Connectivity ports

Link failures Loss of signal Loss of synchronization Link utilization Bandwidth MB/s or frames/s

– Connectivity devices Statistics are usually a cumulative value of all the port statistics

Monitoring the SAN

SAN

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 16

Monitoring the SAN

Security– Zoning

Ensure communication between dedicated sets of ports (HBA and Storage Ports)

– LUN Masking Ensure the only certain hosts have access to certain Storage Array

volumes

– Administrative Tasks Restrict administrative tasks to a select set of users Enforce strict passwords

– Physical Security Access to Data Center should be monitored

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 17

Monitoring Storage Arrays

Health– All hardware components

Front End Back End Memory Disks Power Supplies …

– Array Operating Environment RAID processes Environmental Sensors Replication processes

Storage

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 18

Monitoring Storage Arrays

Capacity– Configured/unconfigured capacity

– Allocated/unallocated storage

– Fan-in/fan-out ratios

Performance– Front End utilization/throughput

– Back End utilization/throughput

– I/O profile

– Response time

– Cache metrics

Storage

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 19

Monitoring Storage Arrays

Security– LUN Access

Ensure the only certain hosts have access to certain Storage Array volumes

Disallow WWN spoofing

– Administrative tasks Most arrays allow the restriction of various array configuration tasks

Device configuration LUN masking Replication operations Port configuration

– Physical Security Monitor access to data center

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 20

Monitoring IP Networks

Health– Hardware Components

Processor cards, fans, Power Supplies, ...

– Cables

Performance– Bandwidth

– Latency

– Packet Loss

– Errors

– Collisions

Security

IP

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 21

Monitoring the Data Center as a Whole

Monitor data center environment– Temperature, humidity, airflow, hazards (water, smoke, etc.)

– Voltage – power supply

Physical security– Facility access (Monitoring cameras, access cards, etc.)

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 22

End-to-End Monitoring

HBAHBA

HBAHBA

Cluster

IPIP

Kee

p A

live

Client

PortPort

PortPort

Storage Arrays

Hosts/Servers with Applications

SAN

NetworkSingle Failure

Multiple Symptoms

Root Cause Analysis

Business Impact

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 23

Monitoring Health: Array Port Failure

HBAHBA

PortPort

HBAHBA

PortPort

SW2

HBA

HBA

SW1

H3

Degraded

Degraded

Degraded

H2

H1

Storage Arrays

Hosts/Servers with Applications

SAN

HBAHBA

HBAHBA

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 24

Monitoring Health: HBA failure

HBAHBA

HBAHBA

SW2

SW1

H3

Degraded

H2

H1

Storage Arrays

Hosts/Servers with Applications

SAN

PortPort

PortPort

HBAHBA

HBAHBA

HBAHBA

HBAHBA

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 25

Monitoring Health: Switch Failure

Storage Arrays

SW2

Hosts/Servers with Applications

SW1

SAN

All Hosts Degraded

PortPort

PortPort

PortPort

PortPort

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 26

Monitoring Capacity: Array

SW2

SW1

New Server

Can the Array provide the required storage to the new server?

Storage Array

Hosts/Servers with Applications

SAN

PortPort

PortPort

PortPort

PortPort

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 27

Monitoring Capacity: Servers File System Space

File SystemFile System

Warning: FS is 66% Full

Critical: FS is 80% Full

Extend FS

No Monitoring FS Monitoring

© 2006 EMC Corporation. All rights reserved. Module Title - 28

Monitoring Performance: Array Port Utilization

PortPort

PortPort

Storage Arrays

SW2

Hosts/Servers with Applications

SW1

SANH3

H2

H1

HBA

HBA

New Server

H4

100%

Po

rt U

til. %

H1 + H2 + H3

HBAHBA

HBAHBA

HBAHBA

HBAHBA

HBAHBA

HBAHBA

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 29

Monitoring Performance: Servers

Critical: CPU Usage above 90% for the last 90 minutes

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 30

Monitoring Security: Servers

Login 1

Login 2

Login 3

Critical: Three successive login failures for username “Bandit” on server “H4”, possible security threat

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 31

Monitoring Security: Array – Local Replication

Storage Array

SW2

Workgroup 1 (WG1)

SW1

SAN

Workgroup 2 (WG2)WG2

WG1

Warning: Attempted replication of WG2 devices by WG1 user – Access denied

Replication CMD

PortPort

PortPort

PortPort

PortPort

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 32

Monitoring: Alerting of Events

Warnings require administrative attention– File systems becoming full

– Soft media errors

Errors require immediate administrative attention– Power failures

– Disk failures

– Memory failures

– Switch failures

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 33

Monitoring: Challenges

ApplicationsApplicationsDatabasesDatabases

UNIXUNIXWINWIN

Servers

MFMF

Network

SANSAN IPIP

OracleOracle InformixInformix MS SQLMS SQL

BrocadeBrocade

McDataMcData

CiscoCisco

SUNSUN

IBMIBM

HPHP

Storage Arrays

TLUTLU

NASNAS

SANSAN

DASDAS

CASCASNetAppNetApp

EMCEMC

HitachiHitachi

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 34

Monitoring: Ideal SolutionMonitoring/Management

Engine

Storage Arrays

Network

Servers, Databases,

Applications

ApplicationsApplicationsDatabasesDatabases

UNIXUNIXWINWIN

Servers

MFMF

Network

SANSAN IPIP

One UI

Storage Arrays

TLUTLU

NASNAS

SANSAN

DASDAS

CASCAS

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 35

Without Standards…

No common access layer between managed objects and applications – vendor specific

No common data model

No interconnect independence

Multi-layer management difficulty

Legacy systems can not be accommodated

No multi-vendor automated discovery

Policy-based management is not possible across entire classes of devices

Network Management

Applications Management

Host Management

Storage Management

Database Management

Interoperability!

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 36

Simple Network Management Protocol (SNMP)

SNMP– Meant for network management

– Inadequate for complete SAN Management

Limitations of SNMP– No Common Object Model

– Security - only newer SAN devices support v3

– Positive response mechanism

– Inflexible - No auto discovery functions

– No ACID (Atomicity, Consistency, Isolation, and Durability) properties

– Richness of canonical intrinsic methods

– Weak modeling constructs

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 37

Storage Management Initiative (SMI) Created by the Storage Networking

Industry Association (SNIA)

Integration of diverse multi-vendor storage networks

Development of more powerful management applications

Common interface for vendors to develop products that incorporate the management interface technology

Key components– Inter-operability testing– Education and collaboration– Industry and customer promotion– Promotions and demonstrations– Technology center– SMI specification– Storage industry architects and

developers

Management Application

Integration InfrastructureObject Model Mapping Vendor Unique Features

•Platform Independent•Distributed•Automated Discovery•Security•Locking•Object Oriented

SMI-S

Interface

CIM/WBEM

Technology

Tape Library

MOF

Switch

MOF

Array

MOF MOF

Many OtherStandard

Object Model per Device

Vendor Unique

Function

SNIA’s SMI-S

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 38

Storage Management Initiative Specification(SMI-S)

Based on: – Web Based Enterprise

Management (WBEM) architecture

– Common Information Model (CIM)

Features:– A common interoperable and

extensible management transport

– A complete, unified and rigidly specified object model that provides for the control of a SAN

– An automated discovery system

– New approaches to the application of the CIM/WBEM technology

Physical ComponentsRemovable Media

Tape DriveDisk Drive

RobotEnclosure

Host Bus AdapterSwitch

Logical Components

ZoneOther

VolumeCloneSnapshot

Media Set

PerformanceCapacity Planning

Removable Media

Storage Resource ManagementVolume Management

Media ManagementOther

Container ManagementFile System

Backup and HSMDatabase Manager

Data Management

Managed Objects

Storage Management Interface Specification

Graphical User Management Users

Management Tools

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 39

Common Information Model (CIM)

Describes the management of data

Details requirements within a domain

Information model with required syntax

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 40

Web Based Enterprise Management (WBEM)

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 41

Enterprise Management Platforms (EMPs)

Graphical applications

Monitoring of many (if not all) data center components

Alerting of errors reported by those components

Management of many (if not all) data center components

Can often launch proprietary management applications

May include other functionality– Automatic provisioning

– Scheduling of maintenance activities

Proprietary architecture

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 42

Monitoring in the Data Center – Summary

Key concepts covered in this module are:

It is important to continuously monitoring of data center components to support the availability and scalability initiatives of any business– Components include the server, SAN, network, and storage arrays

The four areas of monitoring:– Health– Capacity– Performance– Security

There are attempts to define a common monitoring and management model

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 43

Apply Your Knowledge

Upon completion of this topic, you will be able to:

Describe how EMC ControlCenter can be used to monitor the Data Center

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 44

EMC ControlCenter Architecture

Agent Tier• Master Agent (1)• Application Agents (many)

Infrastructure Tier• Server (one)• Repository (one)• Store (many)

User Interface Tier• Console (many)• Optional applications

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 45

EMC ControlCenter Console

Primary interface through which the storage environment is viewed and managed

Java-based application supported on Windows and Solaris platforms

Objects managed by various agents are organized into groups such as Storage, Hosts, and Connectivity

Information about an object can be retrieved by the Console from the Repository or in real-time directly from the agent

Any command issued for the object is passed from the Console to the ControlCenter Server and handled appropriately

There can be several Consoles spread across the network

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 46

EMC ControlCenter Server ControlCenter Server is the primary interface between the Console and the

ControlCenter infrastructure

ControlCenter Server provides a diverse collection of services including:– Web Applications Server – used for installing the Java Console– Security and access management, such as licensing, login, authentication, and

authorization– Communication with the Console– Alert and event management– Real-time statistics– Object management to maintain a list of managed objects– Agent management to maintain a list of available agents

ControlCenter Server retrieves data from the Repository for display by the Java and Web Console

User initiated real-time data requests from some agents, are also handled by the ControlCenter Server

Balances Agent to Store communication based on workload

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 47

EMC ControlCenter Repository

Licensed, embedded Oracle 9i database that holds current and historical information about the managedenvironment

ControlCenter Server executes transactions on the Repository to retrieve information requested by the Console

Store(s) populate the Repository with persistent data from the agents

Repository requires minimal user interaction or maintenance. The database has restricted access and can be updated only by ControlCenter applications

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 48

EMC ControlCenter Store

Store receives the data sent by the agents, processes the data and updates the Repository

There can be multiple Stores in the environment, providing load balancing, scaling, and failover

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 49

EMC ControlCenter Agents

Master agent:– One per host

– Manages other agents on the host – start/stop, monitor agent status and health

ControlCenter Agents:– Runs on hosts to collect data and monitor object

health

– Generate alerts

– Multiple agents can exist on a host

– Passes information to the ControlCenter Store and the ControlCenter Server.

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 50

EMC ControlCenter Support for Storage Arrays

The following Storage Arrays are supported by EMC ControlCenter

EMC Symmetrix

EMC CLARiiON

EMC Centera

EMC Celerra and Network Appliances NAS servers

EMC Invista

Hitachi Data Systems (including the HP and Sun resold versions)

HP Storageworks

IBM ESS

SMI-S (Storage Management Initiative Specification) compliant arrays

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 51

EMC ControlCenter support for SAN Devices

The following SAN devices are supported by ControlCenter

EMC Connectrix

Brocade

McData

Cisco

Inrange (CNT)

IBM Blade Server (IBM-branded Brocade models only)

Dell Blade Server (Dell-branded Brocade models only)

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 52

EMC ControlCenter Support for HostsThe following hosts are supported by ControlCenter Dedicated Host agents

– Microsoft Windows– Hewlett-Packard HP-UX– IBM AIX– IBM mainframe– Linux– Novell Netware– Sun Solaris

Proxy management via Common Mapping Agent (CMA)– Compaq Tru64– Fujitsu-Siemens BS2000– Windows, Solaris, AIX, Linux, and HP-UX hosts can also be monitored by

Common Mapping Agent proxy

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 53

EMC ControlCenter Support for Database and Backup

The following databases are supported by ControlCenter

Dedicated database agent– Oracle– DB2 on mainframe

Proxy management via Common Mapping Agent (CMA)– SQL Server– Sybase– Informix– DB2

Dedicated backup agent– EMC EDM– IBM Tivoli– EMC Networker– Veritas Netbackup

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 54

Discovery of Managed Objects by Agents

Automatic Discovery: Many agents discover data objects automatically

Assisted Discovery: These agents must discover their objects by administrator action– Common Mapping Agent

– Database Agent for Oracle

– Fibre Channel Connectivity Agent

– Storage Agents for CLARiiON, Centera, Invista, NAS, SMI, HP StorageWorks, HDS and ESS

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 55

Data Collection Policies (DCP) Formal set of statements used to manage the data

collected by ControlCenter agents Policies specify the data to collect and the frequency of

collection ControlCenter agents have predefined collection policy

definitions and templates– Default definitions can be easily modified, or new definitions can

be created from the templates provided

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 56

Console View of the Storage Environment

ServerDual HBAs

WWN of HBAs

SAN Switch

Storage Array

Storage Array Front-end Directors and Ports

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 57

Alerts - Overview

Why Alert? - Data availability– Monitor and report on events that could lead to application

outages

– Every ControlCenter agent can monitor a number of metrics30 agents and 700+ alerts

Alert categories– Health

Examples - Database instance up/down, Symmetrix service processor down, Connectivity device port status

– CapacityExamples - File System Space, File/Directory Size Change

– PerformanceExamples – Symmetrix Total Hit %, Host CPU Usage

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 58

Alert Notification

Notification capabilities

Messages are directed to the ControlCenter console by default

Messages can be directed to a Management Framework via Integration Gateway (SNMP) – governed by Management Policy associated with the Alert

E-mail notification as specified in the Management Policy

© 2006 EMC Corporation. All rights reserved. Storage Systems Architecture - Introduction - 59

EMC ControlCenter Console View of Alerts

Alert state

Object Name Message

SeverityAlert severity