An SNMP based failure detection service...• Consensus, Atomic broadcast, Atomic commitment •...

22
An SNMP based failure detection service Matthias Wiesmann, Péter Urbán, Xavier Défago FN NF S Swiss National Science Foundation 北陸先端科學技術大學院大學 1

Transcript of An SNMP based failure detection service...• Consensus, Atomic broadcast, Atomic commitment •...

An SNMP based failure detection service

Matthias Wiesmann, Péter Urbán, Xavier Défago

FN NFSSwiss National Science Foundation

北陸先端科學技術大學院大學

1

An SNMP based failure detection service

Overview

• Failure Detection

• SNMP

• SNMP-FD Service

• Conclusion & Future Work

2

2

An SNMP based failure detection service

Failure detection

• Basic abstraction for distributed systems

• Needed for solving fundamental problems:

• Consensus, Atomic broadcast, Atomic commitment

• Formal definitions: P, ♢S, Ω, φ, etc.

• Also outside distributed system community

• System, Network, Cluster management, Middleware, Grid systems

• Common problem→ common solution

3

3

Failure detection service

An SNMP based failure detection service

•Factorize-out failure detection

• Shared, separate implementation

• Complex, adaptive policies

• Proposed in the literature

• Implemented by middleware systems, grid systems

• Problem: no standard

•Every system has its own service and interfaces

4

4

An SNMP based failure detection service

Standard Needed

• External standard

• Connect clients of the service to the service• Application, Middleware, Monitoring and administration tools

• Suspicion notifications, configuration

• Internal standard

• Use standard for heartbeat messages, internal queries

• We need some form of Middleware

5

5

An SNMP based failure detection service

Overview

• Failure Detection

• SNMP

• SNMP-FD Service

• Conclusion & Future Work

6

6

An SNMP based failure detection service

Why SNMP ?

• Simple Network Management Protocol

• Present in many network attached devices now

• Routers, Smart Switches, UPS, Printers…

• Simple Middleware

• Offers basic features

• Relatively lightweight

• Many applicable standard defined

• Monitoring, fault reporting.

7

7

Network

EquipmentNetwork

Management

StationSNMP

Agent

Monitored

ObjectMIB

An SNMP based failure detection service

SNMP Architecture

• Network Management Station

• Manages equipment

• Equipment’s agent

• Exposes Management Information Base (MIB)

• Sends asynchronous messages (traps) to NMS.

8

8

An SNMP based failure detection service

Overview

• Failure Detection

• SNMP

• SNMP-FD Service

• Conclusion & Future Work

9

9

!"#$%&'

!"#$%&'S

()*$'B!"#$%"&'()&"*'++,

p1B

-,."+%,&'+"/&*'+,!01

-,2"%$!*3%$"#,4,53&6'%,!01

-,.'3&%7'3%+,!01

!"#$%&'&($%)*$+,-&.%*,+/01,

!"#$%"&'()&"*'++,

p2B

!"#$%"&'()&"*'++,

p3B

-,!01,

23(4&56,%*

()*$'A

7,*

8+09:

2,*&;&7,*

<,0+*=,0*&

8+09:!"#$%&'&()%&$*"+",)-./%"012

!"34)-5"012

-,.'3&%7'3%,8#"9:'(6',!01

-,)&"*'++,8#"9:'(6',!01

-,;3$:/&',<'%'*%$"#,!01,

!"#$%&'&($%)*$+)%6&.%*,+/01,&

!"#$%"&$#6)&"*'++

pA

3$*)!10*)$%

2,*&;&7,*

6%)*7)-7"012

*/+%"=,!01

()

(*

)>?+$*3:,@$#A

!'++36'+

B/'&$'+,C,D&$%'+

An SNMP based failure detection service

SNMP-FD service

10

10

An SNMP based failure detection service

Service Architecture

• Distributed architecture

• Dæmon runs on each host

• Monitor local processes

• Exchange heartbeats

• Export MIB information

• Can interact with equipment

• Similar to other failure detection service architectures

11

!"#$%&'

!"#$%&'S

()*$'B!"#$%"&'()&"*'++,

p1B

-,."+%,&'+"/&*'+,!01

-,2"%$!*3%$"#,4,53&6'%,!01

-,.'3&%7'3%+,!01

!"#$%&'&($%)*$+,-&.%*,+/01,

!"#$%"&'()&"*'++,

p2B

!"#$%"&'()&"*'++,

p3B

-,!01,

23(4&56,%*

()*$'A

7,*

8+09:

2,*&;&7,*

<,0+*=,0*&

8+09:!"#$%&'&()%&$*"+",)-./%"012

!"34)-5"012

-,.'3&%7'3%,8#"9:'(6',!01

-,)&"*'++,8#"9:'(6',!01

-,;3$:/&',<'%'*%$"#,!01,

!"#$%&'&($%)*$+)%6&.%*,+/01,&

!"#$%"&$#6)&"*'++

pA

3$*)!10*)$%

2,*&;&7,*

6%)*7)-7"012

*/+%"=,!01

()

(*

)>?+$*3:,@$#A

!'++36'+

B/'&$'+,C,D&$%'+

11

An SNMP based failure detection service

Features

• Uses many standard MIBs

• Host Resource MIB → Process state

• Target & Notification MIB → Subscriber description

• Alarm MIB → Current suspicion description

• Lightweight messaging:

• Heartbeat are UDP traps.

• Leverages SNMP

• Configuration, Security, Interoperability with devices12

❏✔

12

An SNMP based failure detection service

New MIBs

• Heartbeat MIB

• Describes heartbeats

• Failure detection MIB

• Describes failure detector configuration

• Described using standard SNA syntax file

• Accessible using standard tools

13

SNMP table: SnmpFD::kbhbTable

index interval lastbeat lasttime count lostcount mean stddev min max kt-dhcp23.jaist.ac.jp 100 2678 0:0:04:38.49 2671 0 99.9955 28.3417 0 356

13

An SNMP based failure detection service

Implementation

• Validate the architecture

• Implemented in Java

• Open source libraries

• SNMP4J, SNMP4J Agent, Apache Libraries

• Available for download

• http://ddsg.jaist.ac.jp//en/projects/snmp-fd/

14

14

An SNMP based failure detection service

Implementation

• Validate the architecture

• Implemented in Java

• Open source libraries

• SNMP4J, SNMP4J Agent, Apache Libraries

• Available for download

• http://ddsg.jaist.ac.jp//en/projects/snmp-fd/

14

14

• Simple failure detector

• Stable performance

• Consistent with expectations

• Validates approach

• Issues

• High latency

• High CPU load on receiver

An SNMP based failure detection service

Performance

15

0.001

0.01

0.1

1

0 1 2 3 4 5 6 7 8 9 10 11 12 13

Timeout (ms)

Mis

takes

per

sec

ond

99.85%

99.9%

99.95%

100.0%

Quer

y

Acc

ura

cyMistake Rate

Query Accuracy

15

• Simple failure detector

• Stable performance

• Consistent with expectations

• Validates approach

• Issues

• High latency

• High CPU load on receiver

An SNMP based failure detection service

Performance

15

0.001

0.01

0.1

1

0 1 2 3 4 5 6 7 8 9 10 11 12 13

Timeout (ms)

Mis

takes

per

sec

ond

99.85%

99.9%

99.95%

100.0%

Quer

y

Acc

ura

cyMistake Rate

Query Accuracy

}SNMP4JIssue, code spends 95% of time there.

15

An SNMP based failure detection service

Overview

• Failure Detection

• SNMP

• SNMP-FD Service

• Conclusion & Future Work

16

16

An SNMP based failure detection service

Conclusion

• Failure detection service

• Standard based approach desirable

• No need for new standard → use existing SNMP

• Approach validated by prototype

• Can be extended with more advance FD techniques

• Hierarchical FD, adaptive FD, etc.

• Can interact with network devices

• For instance link-down traps

17

17

An SNMP based failure detection service

Future work

• Look into performance issues

• Optimizations needed in the SNMP4J library.

•Port the framework to the latest version of SNMP4J

• Support for AgentX

• Implement framework as sub-agent

• Implement more advanced failure detectors

• Handle network equipment data

18

18

An SNMP based failure detection service

Questions?

19

19

An SNMP based failure detection service

Questions?

19

Thank you…

19