Networking the Cloud Presenter: b97901184 電機三 姜慧如.

Post on 28-Dec-2015

266 views 6 download

Transcript of Networking the Cloud Presenter: b97901184 電機三 姜慧如.

VL2: A Scalable and Flexible Data Center Network

Networking the CloudPresenter: b97901184 電機三 姜慧如

“Shared” Data Center for Cloud

Data Centers holding tens to hundreds of thousands of servers.

Concurrently supporting a large number of distinct services.

Economies of scale. Dynamically reallocate servers

among services as workload pattern changes.

High utilization is needed. Agility!

Agility is Important in Data Centers

Agility: Any machine to be able to

play any role. “Any Service, any Server.”

But…Utilization tend to be low Ugly Secret: 30% utilization is considered

good. Uneven application fit: -- Each server has CPU, memory, disk: most applications exhaust one resource, stranding the others. Long provisioning timescales: -- New servers purchased quarterly at best. Uncertainty in demand: -- Demand for a new service can spike quickly. Risk management: -- Not having spare servers to meet demand brings failure just when success is at hand. Session state and storage constraints: -- If the world were stateless servers….

How to achieve agility?

Workload management-- Means for rapidly installing a service’s code on a server. Virtual Machines, disk images.Storage management-- Means for a server to access persistent data.Distributed filesystems.Network-- Means for communicating with other servers, regardless of where they are in the data center.

But: Today’s data center network prevent

agility in several ways.

Conventional DCN Problems

Conventional DCN Problems

Static network assignment Fragmentation of resource

Poor server to server connectivity

Traffics affects each other Poor reliability and utilization

CR CR

AR AR AR AR

SS

SS

A AA …

SS

A AA …

. . .

SS

SS

A AA …

SS

A AA …

I want moreI have spare ones,

but…1:5

1:80

1:240

Conventional DCN Problems (cont.) Achieve scale by assigning servers

topologically related IP addresses and dividing servers among VLANs.

Limited utility of VMs, cannot migrate out the original VLAN while keeping the same IP address.

Fragmentation of address space. Configuration needed when reassigned to

different services.

Virtual Layer 2 Switch

9

The Illusion of a Huge L2 Switch

1. L2 semantics

2. Uniform high capacity

3. Performance isolation

A AA … A AA …

. . .

A AA … A AA …

CR CR

AR AR AR AR

SS

SS SS

SS

SS SS

AAAA AAAA AAAA A A A A AA A AA AA AA

. . .

Network Objectives

Developers want a mental model where all their servers, and only their servers, are plugged into an Ethernet switch.1. Layer-2 semantics-- Flat addressing, so any server can have any IP Address.-- Server configuration is the same as in a LAN.-- VM keeps the same IP address even after migration2. Uniform high capacity-- Capacity between servers limited only by their NICs.-- No need to consider topology when adding servers.

3. Performance isolation-- Traffic of one service should be unaffected by others.

VL2 Goals and Solutions

11

SolutionApproachObjective

2. Uniform

high capacity

between servers

Enforce hose model using existing mechanisms only

Employ flat addressing

1. Layer-2 semantics

3. Performance

Isolation

Guarantee bandwidth forhose-model traffic

Flow-based random traffic indirection(Valiant LB)

Name-location separation

& resolution service

TCP

“Hose”: each node has ingress/egress bandwidth constraints

Reminder: Layer 2 vs. Layer 3

Ethernet switching (layer 2) Cheaper switch equipment Fixed addresses and auto-configuration Seamless mobility, migration, and failover

IP routing (layer 3) Scalability through hierarchical addressing Efficiency through shortest-path routing Multipath routing through equal-cost multipath

So, like in enterprises… Data centers often connect layer-2 islands by IP routers

12

Measurements and Implications of DCN

Data-Center traffic analysis: DC traffic != Internet traffic

Traffic volume between servers to entering/leaving data center is 4:1

Demand for bandwidth between servers growing faster Network is the bottleneck of computation

Flow distribution analysis: Majority of flows are small, biggest flow size is 100MB The distribution of internal flows is simpler and more

uniform 50% times of 10 concurrent flows, 5% greater than 80

concurrent flows

Measurements and Implications of DCN

Traffic matrix analysis: Poor summarizing of traffic patterns Instability of traffic patterns

Failure characteristics: Pattern of networking equipment failures: 95% < 1min,

98% < 1hr, 99.6% < 1 day, 0.09% > 10 days No obvious way to eliminate all failures from the top of

the hierarchy

VL2’s Methods

Flat Addressing:Allow service instances (ex. virtual machines) to be placed anywhere in the network.

Valiant Load Balancing:(Randomly) Spread network traffic uniformly across network paths.

End-system based address resolution:To scale to large server pools, without introducing complexity to the network control plane.

Virtual Layer Two Networking (VL2) Design principle:

Randomizing to cope with volatility:▪ Using Valiant Load Balancing (VLB) to do destination

independent traffic spreading across multiple intermediate nodes

Building on proven networking technology:▪ Using IP routing and forwarding technologies available in

commodity switches Separating names from locators:

▪ Using directory system to maintain the mapping between names and locations

Embracing end systems:▪ A VL2 agent at each server

VL2 Topology: Clos Network

Clos Network Topology

18

VL2

. . .

. . .

TOR

20 Servers

Int

. . . . . . . . .

Aggr

K aggr switches with D ports

20*(DK/4) Servers

. . . . . . . . . . .

Offer huge aggr capacity & multi paths at modest cost

Use Randomization to cope with Volatility

Valiant Load Balancing: Indirection

20

x y

payloadT3 y

z

payloadT5 z

IANYIANYIANY

IANY

Cope with arbitrary TMs with very little overhead

Links used for up pathsLinks usedfor down paths

T1 T2 T3 T4 T5 T6

[ ECMP + IP Anycast ]• Harness huge bisection bandwidth• Obviate esoteric traffic engineering or

optimization• Ensure robustness to failures• Work with switch mechanisms available today

1. Must spread traffic2. Must ensure dst independenceEqual Cost Multi Path Forwarding

Random Traffic Spreading over Multiple Paths

x y

payloadT3 y

z

payloadT5 z

IANYIANYIANY

IANY

Links used for up paths

Links usedfor down paths

T1 T2 T3 T4 T5 T6

Separating names from locations

How Smart servers use Dumb switches– Encapsulation.

Commodity switches have simple forwarding primitives. Complexity moved to servers -- computing the headers.

VL2 Directory System

RSM

DS

RSM

DS

RSM

DS

Agent

. . .

Agent

. . .. . . DirectoryServers

RSMServers

2. Reply2. Reply1. Lookup

“Lookup”

5. Ack

2. Set 4. Ack(6. Disseminate)

3. Replicate

1. Update

“Update”

VL2 Addressing and Routing

VL2

payloadToR3

. . . . . .

yx

Servers use flat names

Switches run link-state routing and

maintain only switch-level topology

y zpayloadToR4 z

ToR2 ToR4ToR1 ToR3

y, zpayloadToR3 z

. . .

Directory

Service…x ToR2

y ToR3

z ToR4

Lookup &Response

…x ToR2

y ToR3

z ToR3

LAs

AAs

Embracing End Systems

Data center Oses already heavily modified for VMs, storage clouds, etc.

No change to apps or clients outside DC.

Evaluation

Uniform high capacity: All-to-all data shuffle stress test:

▪ 75 servers, deliver 500MB

▪ Maximal achievable goodput is 62.3▪ VL2 network efficiency as 58.8/62.3 = 94%

Evaluation

Performance isolation: Two types of services:

▪ Service one: 18 servers do single TCP transfer all the time▪ Service two: 19 servers starts a 8GB transfer over TCP every

2 seconds

▪ Service two: 19 servers burst short TCP connections

Evaluation

Convergence after link failures 75 servers All-to-all data shuffle Disconnect links between intermediate and aggregation

switches