High Quality and Resilient IPTV Backbone Networks André ... · vi. vii Abstract Resumo Distribuir...

High Quality and Resilient IPTV Backbone Networks

André Ricardo Simões Bento

Dissertation submitted for obtaining the degree of

Master in Electrical and Computer Engineering

Jury

Supervisor: Prof. João Luís da Costa Campos Gonçalves Sobrinho

President: Prof. Nuno Cavaco Gomes Horta

Members: Prof. Joao Jose de Oliveira Pires

Outubro 2012

ii

To my grandmother

iii

Acknowledgements

Acknowledgements

Foremost, I would like to express my sincere gratitude to my supervisor Prof. João Luís Sobrinho

for the continuous support of my M.Sc. study and research, for his patience, motivation and

enthusiasm. His guidance helped me in all the time of research and writing of this thesis.

Besides my advisor, I would like to thank my family, especially my parents, who supported me

emotionally and financially all the way through the course.

Last but not the least, I would like to thank my beloved girlfriend for her patience to see me

writing this thesis throughout the summer and during her vacations.

v

Abstract

Abstract

Multicasting of TV (real time video) streams with the IP protocol suite is a challenge. IP was

initially designed for unicast not multicast. The routing and transport protocols of IP were not planned

for applications requiring a combination of steady high bandwidth, low packet delay and low packet

loss.

This thesis focuses on the design of broadcast trees for the backbone of IPTV networks. We will

present algorithms for the calculation of optimal out-branchings and resilient out-branchings, where in

case of link failure, it is possible to substitute the link by a backup path, granting that the other links of

the tree are not affected by the failure. The failure recovering is provided by Fast Reroute

mechanisms, which are capable of handling failure in sub-second time (around 50ms).

Keywords

IPTV, Backbone, Minimum Out-Branchings, Resilient Out-Branchings, Fast Reroute, MPLS

vii

Abstract

Resumo

Distribuir fluxos de TV (vídeo em tempo real) usando os protocolos IP para transmissões ponto-

multiponto é um desafio. O IP foi inicialmente desenhado para transmissões ponto a ponto, não

ponto-multiponto. Os protocolos de transporte e encaminhamento do IP não foram planeados para

aplicações que requerem-se uma combinação de largura de banda estável, baixo atraso e baixa

perda de pacotes.

Esta tese foca-se no desenho de arborescências de distribuição para o núcleo das redes IPTV.

Apresentamos algoritmos para o cálculo de arborescências óptimas e arborescências resilientes,

onde, no caso de falha de uma ligação é possível substituir a ligação que falhou, por um caminho de

protecção, garantindo que as outras ligações da arborescência não são afectadas pela mudança. A

recuperação das falhas é feita usando Fast Reroute, um mecanismo capaz de lidar com falhas em

tempos que rondam os 50 ms.

Palavras-Chave

IPTV, Núcleo, Arborescências de Custo Mínimo, Arborescências Resilientes, Fast Reroute, MPLS

ix

Table of Contents

Table of Contents

Acknowledgements ................................................................................ iii

Abstract ................................................................................................... v

Resumo ................................................................................................ vii

Table of Contents ................................................................................... ix

List of Figures ........................................................................................ xi

List of Tables ......................................................................................... xiii

List of Algorithms .................................................................................. xv

List of Acronyms .................................................................................. xvii

1 Introduction .................................................................................. 1

1.1 Problem Statement .................................................................................. 2

1.2 Contents .................................................................................................. 3

1.3 Contributions ........................................................................................... 3

1.4 State of the Art ......................................................................................... 4

1.5 IPTV Architecture .................................................................................... 5

2 The Unicast Routing Problem ....................................................... 9

2.1 Network Service Models ........................................................................ 10

2.2 Routing Protocols .................................................................................. 11

2.3 MPLS ..................................................................................................... 13

2.4 Fast Reroute .......................................................................................... 20

3 The Multicast Routing Problem .................................................. 29

3.1 From Unicast to Multicast ...................................................................... 29

3.2 Protocol Independent Multicast (PIM) .................................................... 31

3.3 Point-to-Multipoint LSPs (P2MP LSPs) ................................................. 35

3.4 Protocol Architecture ............................................................................. 37

3.5 Failure Recovering Process................................................................... 39

x

4 Designing Multicast Out-branchings ........................................... 43

4.1 Definitions .............................................................................................. 44

4.2 Minimum Cost Out-Branchings .............................................................. 46

4.3 Connectivity Theorems .......................................................................... 51

4.4 Link-Disjoint Out-branchings Algorithms ................................................ 56

4.5 Comparison of the Out-branchings ........................................................ 70

4.6 Protection Paths .................................................................................... 73

5 Conclusions ................................................................................ 75

References............................................................................................ 79

xi

List of Figures

List of Figures Fig. 1 – IPTV architecture ......................................................................................................................... 5

Fig. 2 – Structure of the backbone of an IPTV network ........................................................................... 6

Fig. 3 – Changes in case of link failure. The link represented in bold is introduced to substitute the link that failed (represented with a cross) ................................................................. 6

Fig. 4 – Verizon’s continental US backbone network (2011) [31] ............................................................ 7

Fig. 5 – T-Systems (aka Deutsche Telekom) germany50 [26] backbone network (2006) ....................... 8

Fig. 6 – Layer model ...............................................................................................................................10

Fig. 7 – Various Virtual Circuits established over a Physical Link: one per packet stream ...................11

Fig. 8 – MPLS layer model .....................................................................................................................13

Fig. 9 – Label Switching in MPLS. A packet with label 50 arrives at LSR at interface B. The LSR looks at its label switching table a sends the packet with label 40 through interface C. ....................................................................................................................14

Fig. 10 – MPLS’s header structure .........................................................................................................15

Fig. 11 – MPLS tunnel ............................................................................................................................15

Fig. 12 – LDP session establishment .....................................................................................................17

Fig. 13 – Path signalling using RSVP .....................................................................................................19

Fig. 14 – Constrain based routing example that uses bandwidth as a routing metric ...........................20

Fig. 15 – Network Failure Anatomy ........................................................................................................21

Fig. 16 – Local Protection .......................................................................................................................22

Fig. 17 – End-to-end protection ..............................................................................................................23

Fig. 18 – Setting up a facility backup ......................................................................................................24

Fig. 19 – Forwarding traffic using the facility backup .............................................................................24

Fig. 20 – Setting up a one-to-one backup ..............................................................................................25

Fig. 21 – Forwarding traffic over a one-to-one backup ..........................................................................26

Fig. 22 – Packet distribution in unicast. A packet stream is generated for each receiving node. ..........30

Fig. 23 – Packet distribution in multicast. Packet streams are not replicated at the source. .................30

Fig. 24 – Receiver “join” and source “register” in PIM-SM .....................................................................32

Fig. 25 – Source “join” and traffic flow in PIM-SM ..................................................................................32

Fig. 26 – Shortest path reconfiguration in PIM-SM ................................................................................33

Fig. 27 – Shortest Path Tree in PIM-SM after reconfiguration ...............................................................33

Fig. 28 – Source “join” in PIM-SSM. “Join” messages for different group are sent to its respective video source. ...............................................................................................34

Fig. 29 – Data flow in PIM-SSM. Video packets are sent to the reverse packet of the respective “join” message. .............................................................................................................34

Fig. 30 – P2MP LSPs example. The forwarding table of P1 shows the different labels using for the different interfaces. .................................................................................................35

Fig. 31 – P2MP LSP setup .....................................................................................................................36

Fig. 32 – Cross layer architecture ..........................................................................................................38

Fig. 33 – Setup of calculated out-branchings .........................................................................................39

Fig. 34 – Link failure scenario ................................................................................................................40

Fig. 35 – Minimum cost out-branching (bold red arrows) .......................................................................46

Fig. 36 - Algorithm to find MOBs: Step I (link cost update) ....................................................................47

xii

Fig. 37 - Algorithm to find MOBs: Step II (0 cost cycle is agglomerated in a supernode) .....................48

Fig. 38 - Algorithm to find MOBs: Step III ...............................................................................................48

Fig. 39 - Algorithm to find MOBs: Step IV (0 cost links are selected as part of the MOB) .....................49

Fig. 40 - Algorithm to find MOBs: Step V (the links of the cycle except link are added to the MOB) .............................................................................................................................49

Fig. 41 – Link-disjoint example part I ......................................................................................................51

Fig. 42 – Link-disjoint example part II .....................................................................................................51

Fig. 43 – Two symmetric cycles. There are two disjoint paths between 0 and 1. ..................................52

Fig. 44 – Closed Path using the 2 links of .................................................................................53

Fig. 45 – Closed path using the 3 links of .................................................................................53

Fig. 46 – Simple path between sets and ......................................................................................54

Fig. 47 – 2-connected directed graph .....................................................................................................55

Fig. 48 – Graph with 2 link-disjoint out-branchings. One represented in normal red lines, the other in dashed blue lines .............................................................................................55

Fig. 49 – Edmond’s theorem application example. Set and have 3 and 2 entering links represented in bold, respectively. .................................................................................56

Fig. 50 – First iteration of algorithm 2: Part I (path is added to out-branching ) ...............................57

Fig. 51– First iteration of algorithm 2: Part II (path is added to out-branching ) ..............................57

Fig. 52 – First iteration of algorithm 2: Part III (the reverse edges of path are added to ) ...............58

Fig. 53 – First iteration of algorithm 2: Part IV (the reverse edges of path are added to ) ...............58

Fig. 54 – Final result of the application of algorithm 2 ............................................................................59

Fig. 55 – Algorithm 2 iteration: Part I (there are always two paths available from the two out-branchings to node ) .............................................................................................61

Fig. 56 – Algorithm 2 iteration: Part II ........................................................................................61

Fig. 57 – Choosing paths in algorithm 2: Part I (path - - is added to out-branching ) .................62

Fig. 58 – Choosing paths in algorithm 2: Part II (path - - - is added to out-branching ) ...........63

Fig. 59 – Choosing paths in algorithm 2: Part III (the reverse edges of the paths are added to the out-branchings) .......................................................................................................63

Fig. 60 – Choosing paths in algorithm 2: Part IV (the result of not choosing the shortest paths for out-branching ) ...........................................................................................................64

Fig. 61 – Graph and the maximum number of disjoint out-branchings ............................................66

Fig. 62 – First iteration of algorithm 4: Part I ..........................................................................................66

Fig. 63 – First iteration of algorithm 4: Part II .........................................................................................67

Fig. 64 – First iteration of algorithm 4: Part III ........................................................................................67

Fig. 65 – Final result of the application of algorithm 4 ............................................................................68

Fig. 66 – 3 out-branchings rooted at vertex 0 generated by our implementation of algorithm 4 ............70

Fig. 67 – 10 node link-disjoint trees generated using algorithm 4. (Main tree in Red) ...........................72

Fig. 68 – 10 node link-disjoint trees generated using algorithm 2 optimized for the main tree (Red) .............................................................................................................................72

xiii

List of Tables

List of Tables

Table 1 – Router S IGP routing table .....................................................................................................40

Table 2 – Algorithm comparison .............................................................................................................71

Table 3 – Protection path cost comparison ............................................................................................74

xv

List of

List of Algorithms

Algorithm 1 - Finding a MOB rooted at in a given multigraph ........................................................49

Algorithm 2: Finding 2 link disjoint out-branchings rooted at in a 2-connected symmetric graph .........................................................................................................................59

Algorithm 3: Finding the maximum number of link-disjoint paths in from to . ..............................65

Algorithm 4: Finding a maximal set of link disjoint out-branchings in rooted at .............................68

xvii

List of

List of Acronyms

AS Autonomous System

ATM Asynchronous Transfer Mode

BBC British Broadcasting Corporation

BFD Bidirectional Forwarding Detection

BFS Breadth First Search

CBS Columbia Broadcasting System

CNN Cable News Network

CPU Central Processing Unit

DSL Digital Subscriber Line

DV Distance Vector

ERO Explicit Route Object

FEC Forward Equivalence Classes

FRR Fast Reroute

IGP Interior Gateway Protocols

IP Internet Protocol

IPTV Internet Protocol Television

IPv4 Internet Protocol version 4

IPv6 Internet Protocol version 6

IS-IS Intermediate Systems to Intermediate Systems

ISP Internet Service Providers

LDP Label Distribution Protocol

LER Label Edge Router

LIB Label Information Base

LS Link State

LSA Link State Advertisements

LSP Label Switched Path

LSR Label Switched Routers

MOB Minimum Out-Branching

MP Merge Point

MPLS Multi Protocol Label Switching

MPLS-TE Multi Protocol Label Switching-Traffic Engineering

MST minimum spanning tree

OSPF Open Shortest Path First

P2MP Point to Multipoint

xviii

PIM Protocol Independent Multicast

PIM-DM Protocol Independent Multicast-Dense Mode

PIM-SM Protocol Independent Multicast-Sparse Mode

PIM-SSM Protocol Independent Multicast-Source Specific Mode

PLR Point of Local Repair

PON Passive Optical Networks

QoS Quality of Service

RIP Routing Information Protocol

RP Rendezvous Point

RRO Record Route Object

RSVP Resource Reservation Protocol

RSVP-TE Resource Reservation Protocol-Traffic Engineering

SHO Super Hub Office

SPT Shortest Path Tree

TCP Transmission Control Protocol

TE Traffic Engineering

TTL Time to Live

TV Television

UDP User Datagram Protocol

UK United Kingdom

US United States

VC Virtual Circuit

VHO Video Hub Offices

VOD Video On Demand

VSO Video Serving Office

1

Chapter 1

Introduction

1 Introduction

2

1.1 Problem Statement

In the past few years, we have seen the telecommunication carriers do a global effort in the

rapid implementation of Internet Protocol Television (IPTV). IPTV encodes live TV streams in a flow of

IP packets and delivers them to users through broadband networks. The key reasons for using IP

networks for the transport of TV are:

Convergence: The services offered by the telecommunications companies are converging on

IP, including the transmission of digital voice and data. The addition of IPTV allows

telecommunication carriers to strengthen their competitiveness by offering triple play services

(digital voice, data and TV);

Cost: Having a uniform platform reduces the cost of maintaining separate networks;

Flexibility: IP has the ability to support diverse applications; this offers flexibility opening up

opportunities for the introduction of new services like interactive TV and Video on Demand

(VOD).

telecommunication carriers use their IP networks to distribute the TV streams; this presents

challenges because the networks were not designed for this type of service and IPTV requires:

Minimal Packet Loss: IP packet loss can represent various levels of image damage, ranging

from a single pixelated block in an image, to a long period of degraded, pixelated or

unavailable sequence of images. In order to offer the user a satisfactory Quality of Service

(QoS) packet loss needs to be minimal.

Steady High Bandwidth: The total amount of video-stream data that can be sent is ultimately

limited by the bandwidth provisioned over the network. Any increase in bandwidth demand

that is above the maximum capacity of the link, will result in video packets being lost.

Low Packet Delay: Live TV streams are encoded and sent in IP packets. At the reception, an

initial delay is introduced and the first packets are stored in a buffer. After the initial delay,

packets are decoded and transmitted in the user’s TV. This mechanism helps in coping with

short transmission delays and jitter. However, if a packet fails to arrive at the reception at the

time it is supposed to be decoded, it is like it was lost since it will not be used.

Sub-second Failure Restoration: In the case of link failure, restoration needs to be done in

sub-second time to avoid packet loss, which could cause image damage or even service

interruption.

3

1.2 Contents

In this dissertation we will evaluate network designs that can cope with the challenging

requirements of IPTV. We will start with an overview of the IPTV backbone architecture and its

components.

In chapter 2: The Unicast Routing Problem, we will discuss the network service models and

protocols used to forward traffic sent from a single source to a single location in the network. We will

also address the resource reservation problem and refer to a link-failure protection mechanism called

Fast Reroute, which is capable of providing sub-second restoration.

The emergence of applications like IPTV led to introduction of routing protocols capable of

forwarding traffic to a group of recipients. The changes in routing, from a source to multiple

destinations, as opposed to routing from a source to a single destination, are addressed in chapter 3:

The Multicast Routing Problem. We will see the out-trees that are currently being used for forwarding

and analyse mechanisms to calculate them. An out-tree is a directed tree rooted at a specific source

vertex. If the out-tree spans all the vertices of a given graph is called out-branching.

In the end of chapter 3, we will also discuss possible protocol suites that can cope with the

requirements of IPTV and simultaneously accommodate custom out-branchings. We will then describe

a methodology to handle link failure. The purpose of this methodology is to avoid packet loss when

using the Fast Reroute mechanisms.

In chapter 4: Designing Out-branchings, we will present the custom out-branching designs. We

will present algorithms to calculate optimal out-branchings and resilient out-branchings. In the end of

the chapter we will compare the designs and evaluate its pros and cons. Finally, in chapter 5:

Conclusion, we will do a short summary of the topics covered by this thesis and discuss future work to

be done in this subject.

1.3 Contributions

The objective of this thesis is to compute and implement different out-branching designs,

understanding the advantages and disadvantages of these implementations. Given a certain out-

branching we purpose a protocol suite that can use this out-branching to transport IPTV packets and,

at the same time, is capable of handling the requirements of IPTV. We have studied the interaction

between the involved protocols and based on previous work ([1], [2], [3] and [4]), developed a strategy

to operate them.

We have also studied different out-branching designs:

Shortest path out-branchings, where the cost of the paths from the root to the each vertex is

minimum;

4

Minimum cost out-branchings, where the sum of the costs of the links in the out-branching is

minimum;

Resilient out-branchings, where each link of the out-branching can be protected by a path

that is disjoint from the out-branching. If the protection paths shared links with the main tree,

these links would transport more than packet stream replicas, this could cause link overload.

We have implemented algorithms that generate the out-branchings above. Regarding the

resilient out-branchings we will present 2 algorithms. Given a certain graph, one of them generates 2

disjoint out-branchings and the other generates the maximum number of link-disjoint out-branchings

possible. For the first, we have come up with 2 heuristics that improve the resultant out-branchings.

The purpose of the heuristics is the following:

Improve the main out-branching: where the objective is to reduce the link cost of the paths

between the root and the vertices in the main out-branching;

Improve for the backup paths: where the objective is to reduce the total link cost of paths

used for backup.

As far as the protection paths are concerned, we also purpose a way of calculating them.

1.4 State of the Art

A group of authors ([1], [2], [3] and [4]), suggested a protocol suite capable of accommodate

custom out-branching designs, as well as a failure handling method used to avoid packet loss during

the Fast Reroute recovering process . The authors have also suggested an algorithm to calculate out-

branchings, and respective disjoint protection paths. This algorithm took care of link overload problem

in case failure, however, the authors were not preoccupied with the cost of the out-branchings. They

also did not had in consideration the cost of the protection paths. Other authors came up with

algorithms [5] that resulted in similar out-trees, also disregarding the cost optimization aspect.

In [6] and [7] simple schemes are proposed to solve single link failures in sub-second time with

low packet loss in the process, these schemes introduce minor changes in routing and are easy to

implement. However, these schemes disregard the link overload problem during the recovery. Like the

others the proposed solutions do not account for the costs of the out-branchings.

Another different solution is proposed in [8], this scheme uses fast layer 3 failure detection, this

type of schemes is often disregarded by the telecommunications carriers because they require short

timers to detect the failures, this can produce false positive cases and as we will see further ahead

can compromise the best routing options.

Finally in [9], 3 protocol architectures are discussed with the objective of distributing IPTV

streams, the advantages and disadvantages of the architectures are briefly explained.

5

1.5 IPTV Architecture

IPTV networks are usually built on top of existent IP networks and are separated in three areas:

access, metro and backbone. These areas have different components and philosophies. Fig. 1 shows

a typical IPTV architecture.

SHO VHO VSOPON

Access

Cable

Access

DSL

Access

Backbone Metro Acess

Super Hub

Office

Video Hub

Office

Video Serving

Office

National Channel

Servers

Local Channel

Servers

Fig. 1 – IPTV architecture

In the backbone, there are two essential components: the Super Hub Office (SHO) and the

Video Hub Offices (VHOs). The SHO gathers the TV streams of the national channels (channels that

are not specific to a region or state, like CNN in the US, or BBC One in the UK) and distributes them to

a large set of receiving locations, the VHOs. Each VHO gathers the TV streams of the local channels

(channels that are only reproduced in a specific region like WCBS the CBS channel of New York) and

feeds both national and local streams to a metro area. IP routers are used to transport the IPTV

content from the SHO to the VHOs. In addition, both SHOs and VHOs are enabled with powerful

buffer servers to enable retransmissions and some other Video-On-Demand (VOD) and interactive

services. Most of them have a satellite used to receive local/national contents.

Is at the VHO, that traffic leaves the core network. It distributes the contents to Video Serving

Offices (VSOs) via the routers in the metro area. The VSOs establish the border between metro and

access networks, than can either be Cable, Digital Line Subscriber (DSL), or Passive Optical

Networks (PON).

The focus of this thesis is on the backbone of the IPTV networks, which are supported by the

telecommunication carriers’ already existent IP networks. Fig. 2 shows an example of the backbone of

an IP network that supports IPTV. The bold green circle represent de SHO and dotted and dashed

green circles represent de VHOs. Both SHOs and VHOs incorporate two routers so that in case of

router failure, the network is connected in a way, which grants that there is always one router available

to feed the VSOs. The black lines are undirected edges; every edge has an upstream and

6

downstream link. The normal black lines represent edges that have a link used in the out-branching.

This out-branching, rooted at the SHO, is depicted by the red arrows that represent links. The black

dashed lines are unused edges. They are used to substitute a link of the out-branching in case of

failure.

SHO

VHO

VHO

VHO

VHO

VHO

VHO

SHO

VHO

Router

Used Edge

Unused Edge

Link of the

broadcast out-tree

Fig. 2 – Structure of the backbone of an IPTV network

SHO

VHO

VHO

VHO

VHO

VHO

VHO

SHO

VHO

Router

Used Edge

Unused Edge

Link of the

broadcast out-tree

Fig. 3 – Changes in case of link failure. The link represented in bold is introduced to substitute

the link that failed (represented with a cross)

Most developed countries have their own IP backbone networks. In general, there are few big

telecommunication carriers per country, each one supporting its own national backbone network. The

7

main difference between these networks is size.

When compared to the US’s, the European networks are generally smaller. Nevertheless major

IPTV carriers in US and in Deutschland, Verizon and T-Systems, respectively, support IP backbone

networks that are about the same size. These networks are represented in Fig. 4 and Fig. 5.

Fig. 4 – Verizon’s continental US backbone network (2011) [10]

8

Fig. 5 – T-Systems (aka Deutsche Telekom) germany50 [11] backbone network (2006)

As we can see Verizon’s network has 47 nodes and T-Systems’ has 50 nodes. We will focus our

work in this kind of networks: big IPTV backbone networks.

Instead of analysing European countries individually, we could also picture it as a whole, but

mind that in Europe, carriers distribute IPTV contents in and for its own country. This explains the

existence of an IPTV backbone area per country per telecommunication carrier, to where video

contents should be derived and then distributed.

9

Chapter 2

The Unicast Routing Problem

2 The Unicast Routing Problem

10

This chapter overviews the network service models and routing protocols used in unicast routing.

Unicast is the term that designates the sending of messages from a single source to a single

destination in the network. We will also talk about Multi Protocol Label Switching and Fast Reroute.

2.1 Network Service Models

Perhaps the most important abstraction provided by the network layer to the upper layers is

whether it uses virtual circuits (VCs) or datagrams [12].

Layer 2

Data-link

Layer 1

Physical

...

Layer 3

Network

Layer 4

Transport

Layer 2

Data-link

Layer 1

Physical

Layer 3

Network

Layer 4

Transport

...

Sender Receiver

Fig. 6 – Layer model

VCs provide a connection oriented service through the establishment and teardown of pseudo

(virtual) circuits. This mechanism enables the reservation of link resources, providing steady high

bandwidth, low delay and low packet loss. Nevertheless, VCs require state information to be recorded

in the routers for each packet stream.

11

Physical

Link

Virtual Circuit

Ph

ysic

al

Lin

k

Ph

ysic

al

Lin

k

Fig. 7 – Various Virtual Circuits established over a Physical Link: one per packet stream

In contrast, datagrams do not require state information to be recorded in the routers for each flow

of data, since data transmission is not preceded by circuit establishment. IP uses datagrams; it has

low complexity and therefore handles heterogeneous networks best. IP uses resources evenly through

any number of users, giving them no guarantees whatsoever. Due to its lack of guarantees, IP service

is often called a best-effort service.

As far as IPTV is concerned, we can see that VCs would be suitable for video transmission,

however IP uses datagrams. Multi Protocol Label Switching is an answer to this dichotomy. MPLS is a

label switching mechanism that allows the traffic to be divided into forwarding classes which may be

forwarded the same way. The use of MPLS together with the Resource Reservation Protocol (RSVP)

enables the reservation of resources to a specific data flow type (eg. Video), keeping only state

information relative to the flow type, not the flow itself. MPLS is fully compatible with IP and in fact

RSVP uses the IP routing protocol information to do the reservation of resources. We will talk about

MPLS and RSVP later, for now, we will explore the IP routing protocols, from which RSVP depends

on.

2.2 Routing Protocols

In this dissertation we are particularly interested on the best way of using the resources of a

particular operator. Given so, we are then only interested in routing within a given autonomous system

(AS). An intra-AS routing protocol is used to maintain the forwarding tables in the routers of an AS.

Once the routing tables are configured, datagrams are routed within the AS. Intra-AS routing protocols

12

are also known as Interior Gateway Protocols (IGP). IGP can be divided into two categories: Distance-

Vector (DV) protocols and Link-State (LS) protocols.

The name distance-vector is derived from the fact that routes are advertised as vectors of

(distance, direction), where distance is defined in terms of link cost and direction is defined in terms of

the next-hop router. Each router learns routes from its neighbouring routers and then advertises the

paths from its own perspective. A typical distance vector routing protocol is a routing algorithm in

which routers periodically send routing updates to all neighbours by broadcasting their entire routing

tables. The algorithm used to calculate the shortest paths is distributed and because of that routing

loops may occur when using a DV protocol [13].

In a LS protocol, the network topology and all link costs are known in each router. This is

accomplished by having each node broadcast the identities and costs of its attached links to all other

routers in the network. This link state broadcast can be accomplished without the routers having to

initially know the topology of the network. A node only needs to know the identities and costs to its

directly-attached neighbours; it will then learn about the topology of the rest of the network by

receiving link state broadcast messages from the other nodes. The result of the nodes' broadcast is

that all nodes have an identical and complete view of the network. Each router can then run a Dijkstra

algorithm to calculate the shortest paths to the other routers. Any link cost changes or failure

occurrences are communicated through the network using Link State Advertisements (LSAs).

The key differences between distance vector and link state routing algorithms are the following:

Convergence time: the computation time to calculate the shortest paths in DV protocols take

usually take longer than in LS protocols. The distributed computations can result in cycles

that can extend the computation time;

State information: LS protocols require the routers to store state information of the whole

network, in other words routers need to know the complete topology of the network. DV

protocols only need to store information related to their neighbours.

The fact that in LS routing protocols, routers have complete knowledge of the network topology

is useful if we want to design custom out-branchings. If the routers know the topology they can use it

to calculate the out-branchings independently, reaching the same results without having the necessity

of exchanging messages between the routers. The most popular LS protocols are Open Shortest Path

First (OSPF) and Intermediate Systems to Intermediate Systems (IS-IS). They are similar; the

operator’s choice is usually based on convenience issues, not technical aspects.

Another advantage of using LS protocols is its low convergence time. This is especially useful

upon failure, in order to recalculate the shortest paths in the new topology (after failure). IS-IS and

OSPF use a message exchange mechanism to detect failures. They keep HELLO messages between

nodes to evaluate their condition; they use timers to detect the non-reception of a scheduled HELLO

packet. When a failure is detected, routers distribute this info through the network using LSAs. Note

that the choice of the timers is a tricky question, short timers may cause false positive failure

detection, on the other hand, long timers may cause slow failure detection and in consequence,

13

packet loss. The failure detection times depend on the timers, but in common implementations range

from 1 to 10s. Late extensions of OSPF and IS-IS offer a Fast-Hello mechanism that is able to detect

failure in less than 1s.

After the application of the LS protocol the resultant shortest path may be used to forward data

packets. However if we use MPLS we can choose alternate paths, other than the shortest.

2.3 MPLS

Multi-Protocol Label Switching (MPLS) [14] began in the mid-1990s and appeared mainly to

solve problems on IP over Asynchronous Transfer Mode (ATM) networks, which used VCs to forward

the IP datagrams. MPLS has generally four main objectives:

Improving scalability using labels to aggregate state information and to do routing

hierarchies;

Improving routing flexibility, using the labels to identify traffic with specific necessities to

construct personalised paths;

Optimizing network performance if used globally;

Simplifying router integration with switching technology by forcing switches to act like

routers, reporting physical topology information to the network layer and by unifying the

routing, control and addressing of layers 2 and 3.

Multi protocol label switching operates at a layer that is generally considered to lie between

traditional definitions of layer 2 (data link layer) and layer 3 (network layer), and thus is often referred

to as a "layer 2.5" protocol. It was designed to provide a unified data-carrying service for both circuit-

based clients and packet-switching clients which provide a datagram service model. It can be used to

carry many different kinds of traffic, including IP packets.

IP

ATM / Frame Relay / Ethernet/

...

DSL / 10BASE-T / OTN/ ...

Layer 3 - Network Layer

Layer 2 - Data-link layer

Layer 1 - Physical Layer

...

MPLS Layer “2,5" - MPLS

Fig. 8 – MPLS layer model

This label switching mechanism brought by MPLS consists on the addition of a short extra label

14

in each IP packet. In a MPLS network, like the one represented in Fig. 9, a packet enters a MPLS

domain by a Label Edge Router (LER), which introduces a MPLS header in the packet and sends it to

the core of the network. This header contains the label that has forwarding information to be used by

the core routers named Label Switched Routers (LSR). When the packet reaches the end of a path (to

an exit LER), the label is removed. This path is called Label Switched Path (LSP). The labels are

stored in a label switching table, where each entry contains a pair incoming label, incoming interface

and the corresponding pair containing the out-coming label and its destination interface.

LSR

DLER

(Ingress)

LER

(Egress)

LSP

A

BC

B 50 C 40

IN

Int

IN

Label

OUT

IntOUTLabel

D50

IP

40 IP

Fig. 9 – Label Switching in MPLS. A packet with label 50 arrives at LSR at interface B. The LSR looks

at its label switching table a sends the packet with label 40 through interface C.

Labels have a local character and can be distributed using the Label Switching Protocol (LDP) or

the Resource Reservation Protocol (RSVP), which we will refer to later in this chapter.

Fig. 10 indicates the structure of MPLS’s header. It is formed by:

Label – Packets are expedited based on this field. This field will be used to index MPLS’s

expedition table, the Label Information Base (LIB);

EXP – originally meant to be experimental, are now used to classify packets into service

classes;

S – Labels can be stacked. S is a bit used to identify the bottom of the stack.

TTL – Time to Live field is used to avoid cycles and infinite packet redistribution.

15

Data-link Layer Header MPLS Header IP Header

Label20 bit

EXP3 bit

S1 bit

TTL8 bit

Fig. 10 – MPLS’s header structure

Since it is not possible to have a “personalized” treatment to each packet, they need to be

grouped in classes. These are called Forward Equivalence Classes (FEC). Each class contains

packets with same transport requirements. This concept is inherited from IP. However, in MPLS, it is

possible to have much more traffic classes. The most common factors used to classify the packet into

FECs are the packet stream destination, packet stream origin and content type (e.g. video). All

packets of a given FEC may be bound to the same MPLS label, which are expedited the same way. In

contrast with IP expedition, the attribution of FEC to a packet is only done once (you can change a

packets FEC by adding a second label to the packet).

Labels are the essential element for packet expedition. A LSR just needs to examine the label to

know which will be the next node, to where the packet needs to be expedited to. Labels are local, this

means they only identify univocally connections between LSRs.

A. Label Stacking

It is possible to insert more than one label per IP packet. This enables the existence of

hierarchies in the LSPs.

R3 R4 R5

R1

R2 R7

R66

617

1317

13

621

1321

Fig. 11 – MPLS tunnel

16

Fig. 11 shows an example of this hierarchy. In this example R1 sends a packet with label 6 to

R3, which is the entry router of an MPLS tunnel. R3 receives the packet and instead of switching label

6, it adds another label: label 17. R3 then sends the packet to R4 that only evaluates the first label of

the label stack, 17. R4 switches the label 17 with label 21 and forwards the packet to R5. This is an

egress LSR, it removes label 21 from the stack and evaluates the inner labels. R5 then sends the

packet without MPLS labels to the respective router, according to its forwarding table.

With label stacking, R1 does not need to know the configuration of the MPLS tunnel. It just

needs to know the entry and exit routers. On the other hand, R4 does not need to know the

configuration outside the MPLS tunnel.

There are two protocols that usually used to attribute labels to the packet streams: the Label

Distribution Protocol (LDP) and the Resource Reservation Protocol (RSVP).

B. LDP: Label Distribution Protocol

LDP is the result of the MPLS Working Group in the IETF. LDP was made for MPLS unlike

RSVP, which existed before and was extended to handle label distribution. Nevertheless, LDP is much

used, especially when optimizing the network is not a vital issue. Typically it is used in metro or local

networks. In other cases RSVP is the one to go with.

LDP has the advantage of not requiring any manual configuration; nodes interact between each

other in order to discover new LSPs automatically. Each LSR establishes a TCP session with its

neighbours to run the protocol, this grants reliable message delivery. With this setup we can see that

each router has only information from its neighbours, this limits the labelling options but enhances

scalability.

The operation of LDP is driven by message exchanges between peers. Potential peers, also

known as neighbours, are automatically discovered via hello messages multicast to a well-known UDP

port. The protocol enables the discovery of remote peers using targeted hello messages. Once a

potential peer is discovered, a TCP connection is established to it and an LDP session is set up. At

session initialization time, the peers exchange information regarding the features and mode of

operation they support. After session setup, the peers exchange information regarding the binding

between labels and FECs over the TCP connection. As we have referred, the use of TCP ensures

reliable delivery of the information, but it also allows the use of incremental updates, rather than

periodic refreshes. LDP uses the regular receipt of protocol messages to monitor the health of the

session. In the absence of any new information that needs to be communicated between the peers,

keep-alive messages are sent. Fig. 12 shows LDP session establishment.

17

Fig. 12 – LDP session establishment

LDP does its routing based on the information it gets from the running IGP protocol. Having an

IGP running below LDP is therefore essential. The association between an FEC and a label is

advertised via label messages: label mapping messages for advertising new labels, label withdraw

messages for withdrawing previously advertised labels. The fundamental LDP rule states that LSR A

that receives a mapping for label L for FEC F from its LDP peer LSR B will use label L for forwarding if

and only if B is on the IGP shortest path for destination F from A’s point of view. This means that LSPs

setup via LDP always follows the IGP shortest path and that LDP uses the IGP to avoid loops.

Given our necessities, LDP by its own, does not seem to be the way to go. It is a highly scalable

and simple protocol but it fails when we try to get optimization out of it. It is not to make full path

reservations or have a good traffic engineering process, since it does its routing automatically and

based on IGP information. Therefore we shall evaluate our other choice, RSVP, which gives us better

tools to treat video flows.

C. RSVP

Another scheme for distributing labels for transport LSPs is based on the Resource Reservation

Protocol (RSVP). RSVP was invented before MPLS. It was originally developed as a scheme to create

bandwidth reservations for individual traffic flows in networks (e.g. a video telephony session between

a particular pair of hosts) in an attempt to bring QoS to IP. RSVP includes mechanisms for reserving

bandwidth along each hop of a network for an end-to-end session. However, the original QoS

application of RSVP has fallen out of favor because of concerns about its scalability: the number of

end-to-end host sessions passing across a service provider network would be extremely large, and it

would not be desirable for the routers within the network to have to create, maintain and tear down

state as sessions come and go.

In the context of MPLS, however, RSVP has been extended to allow it to be used for the

creation and maintenance of LSPs and to create associated bandwidth reservations. When used in

this context, the number of RSVP sessions in the network is much smaller because of the way in

18

which traffic is aggregated into an LSP. A single LSP requires only one RSVP session, yet can carry

all the traffic between two LER, containing many end-to-end packet streams. MPLS also allows the

traffic to be aggregated into FECs, through the simple label switching mechanism, reducing the

number of existent packet streams.

In contrast to LDP, a RSVP-signalled LSP may not necessarily follow the path dictated by the

IGP and, in its extended form, it is possible for the ingress router to specify the entire end-to-end path

that the LSP must follow, having full control over the path. This property is very important in the

context of traffic protection schemes such as Fast Reroute, discussed in detail lately in this chapter.

RSVP requests resources for a traffic stream in only one direction from sender to one or more

receivers and maintains soft state (the reservation at each node needs a periodic refresh) of the host

and routers' resource reservations, hence supporting dynamic automatic adaptation to network

changes.

Paths may be computed online by the router or offline using a path computation tool. In the case

of online computation, typically only the ingress router needs to be aware of any constraints to be

applied to the LSP. Moreover, use of the explicit routes eliminates the need for all the routers along

the path to have a consistent routing information database and a consistent route calculation

algorithm.

The creation of an RSVP-signalled LSP is initiated by the ingress LER. The ingress LER sends

an RSVP Path message. The destination address of the Path message is the egress LER. The Path

message includes the IP address of the previous node and some data objects, the most relevant are:

Label Request Object - Requests a MPLS label for the path. As a consequence, the egress

and transit routers allocate a label for their section of the LSP.

Explicit Route Object (ERO) - The ERO contains the addresses of nodes through which the

LSP must pass. If required, the ERO can contain the entire path that the LSP must follow from

the ingress to the egress.

Record Route Object (RRO) - RRO requests that the path followed by the Path message (and

hence by the LSP itself once it is created) be recorded. Each router through which the Path

message passes adds its address to the list within the RRO. A router can detect routing loops

if it sees its own address in the RRO.

Sender TSpec - TSpec enables the ingress router to request a bandwidth reservation for the

LSP in question.

In response to the Path message, the egress router sends a Resv message. Note that the

egress router addresses the message to the adjacent router upstream, rather than addressing it

directly to the source. This triggers the upstream router to send a Resv message to its upstream

neighbour and so on. As far as each router in the path is concerned, the upstream neighbour is the

router from which it received the Path message. This scheme ensures that the Resv message follows

the exact reverse path of the Path message.

Here are some of the objects contained in a Resv message:

19

Label Object - Contains the label to be used for that section of the LSP.

Record Route Object (RRO) - Records the path taken by the Resv message, in a similar way

to the RRO carried by the Path message. Again, a router can detect routing loops if it sees its

own address in the Record Route Object.

As can be seen, RSVP Path and Resv messages need to travel hop-by-hop because they need

to establish the state at each node they cross, e.g. bandwidth reservations and label setup.

R3 R4

R5R1

R2

Path messages

Resv messages

200

300 100

Fig. 13 – Path signalling using RSVP

Fig. 13 shows an example of the establishment of a path. R1 needs to setup a data packet

stream with specific QoS to R5. To accomplish it, R1 sends a Path message to R3, asking for a label

for a given data flow. R3 then redirects the Path message to router R4 and R4 redirects it to R5. R1

needs to resend this Path message every 30 seconds to refresh the state.

When the message reaches its destination, router R5, it allocates a label for that flow locally and

responds with a Resv message to R4. In this case label 100 was chosen. As soon as the Resv

message reaches R4, the router must first make a reservation based on the request parameters and

then forward the request upstream (in the direction of the sender). This process is repeated in the

upstream routers.

The routers store the characteristics of the packet stream, and also police it to grant the

specified QoS. This is all done in soft state, so if nothing is heard for a certain length of time, then the

reservation will be cancelled. This solves the problem if either the sender or the receiver crash or are

shut down incorrectly without first cancelling the reservation.

D. Traffic Engineering

Traffic engineering is the process of routing data traffic to balance the traffic load on the various

links, routers, and switches in the network and is most applicable in networks where multiple parallel

or alternate paths are available. Fundamentally, the objective of traffic engineering is to ensure

sufficient capacity exists to handle the forecast demand from the different service classes while

20

meeting their respective QoS objectives.

Along the years many protocols got Traffic Engineering (TE) extensions to better cope with

nowadays traffic demands, MPLS was no different.

MPLS-TE brought with it the concept of constrain based routing, a change in shortest path

concept. If all the traffic is forwarded through the same “shortest paths”, paths may become

congested. Perhaps we can get better results distribute the traffic wisely between the paths available.

Constraint-based routing is a mechanism used to meet TE requirements, which can take in to

account more than one metric to define its paths. Besides link cost, it can use bandwidth, delay and

jitter.

1

R2

R6 R7

R1

R3 1

R5

40Mbit/s traffic

70Mbit/s traffic

40Mbit/s bandwidth

100Mbit/s bandwidth

1 1

1

R4

Fig. 14 – Constrain based routing example that uses bandwidth as a routing metric

Fig. 14 shows an example of constrain based routing that uses bandwidth as a routing metric. If

short path first routing was used in this case, both traffic flows from R1 to R5 would pass through path

R1-R2-R3-R4-R5, which has the lowest total combined cost. This is the worst path in terms of

bandwidth and will have difficulties in handling the traffic.

2.4 Fast Reroute

Fast Reroute (FRR) [14] is a technology that offers resiliency to MPLS networks. With FRR it is

possible to quickly recover from link failure. The medium time down time when a single fails is around

50 ms, which is far better than convergence times of the IGP protocols.

FRR uses pre-programmed backup paths to detour the packets from the affected links, offering a

safe path that can quickly be introduce once failure is detected. These procedures are transparent to

the IGP protocols and therefore will leave the layer 3 routing tables intact.

21

R1 R6R5R2

R3 R4

PLR MP

Fig. 15 – Network Failure Anatomy

Fig. 15 shows the anatomy of a failure in a given network. If connection R2-R5 fails, traffic will be

redirected through path R2-R3-R4-R5. The point that is upstream of the failure is called Point of Local

Repair (PLR); on the other hand, the point downstream of the failure is called Merge Point (MP).

FRR gives us essentially two schemes for protecting a given LSP. We can protect a full path

(end-to-end) or just do a local protection detour (hop-by-hop). Both have its advantages and

disadvantages and although local protection is the most popular, they can be used complementarily.

A. Local Protection

The idea of local protection is simple. Instead of protecting the full path, the traffic is only

detoured at the PLR, protecting only one link. This is pretty similar to what happens when a freeway

between two cities closes down somewhere between exits A and B. Instead of redirecting the traffic

through other national routes, vehicles are redirected to a detour at exit A, which leads them to point B

where they enter the freeway once again. Following the freeway’s philosophy, it is necessary to create

what we call detour path as we can see in Fig. 16.

22

R8

R1

R3

R2

R4

R9

S

R5

D

LSP from S to D

Detour

Fig. 16 – Local Protection: If link R1R2 fails traffic is detoured at the PLR (R1), returning to the LSP at

the MP (R2)

B. End-to end Protection

End-to end protection consists on having a secondary LSP protecting a primary LSP (Fig. 17).

This LSP is an end-to-end path that redirects traffic between two nodes that are typically a few hops

apart in the network. This is usually done offline given that it is normally necessary that the involved

nodes have full network knowledge, which may not be available to all routers (which may be in

different Link State areas or just not running any Link State protocol at all).

On the other hand, this scheme of protection offers advantages as well. Since the backup paths

normally are callibrated by the network manager, it is possible to have better QoS guarantees.

Another advantage is that no matter what link(s) fail in the primary LSP, it is always possible to protect

it. Path signaling is usually done by RSVP.

23

LSP2

secondary

LSP1 primary

R1

R2 R3

DS

R4

Fig. 17 – End-to-end protection

Both in local and end-to-end protection, the protection paths must be computed and signalled

before a failure happens. We have seen in MPLS that flag S indicated the end of a stack of labels. We

stack an extra label on top of another usually to prevent the following router to access the inner label.

This encapsulation mechanism forms a LSP tunnel; these tunnels can be used for protection. There

are two methods for doing protection LSPs, which differ in the label with which the traffic arrives at the

MP. This in turn influences the number of LSPs that can be protected by a single backup tunnel,

yielding either N:1 (facility backup) or 1:1 (one-to-one backup).

C. Facility Backup

In facility backup, a single protection LSP is used to protect many primary LSPs. Traffic arrives

over the backup tunnel with the same label as it would if it arrived over the failed link. The only

difference from the point of view of forwarding is that, when arriving to the MP, it may come from

different interfaces. Normally, traffic comes from the protected LSP, but in failure cases, traffic may

travel through the protection tunnel. In both cases, traffic reaches the MP with the same label.

To ensure that traffic arrives at the MP with the correct label, all that needs to be done is to

tunnel it into the backup by pushing the backup tunnel label on top of the protected LSP label at the

PLR (label stacking) and remove the first label on the stack at the penultimate node before reaching

the MP (penultimate hop popping). The ability for several LSPs to share the same protection path is

an important scaling property of facility backup.

24

C

A

D

X Y

Protected LSP from X to Y

Protection

tunnel for

link A-B

B101

200

102 100

201 Pop

Push 102 Swap 102→101 Swap 101→100 Pop 100

Swap 201→200 Pop 200

Fig. 18 – Setting up a facility backup

Fig. 18 shows the setup of the backup tunnel before the failure and the forwarding state that is

installed at every hop in the path. No new forwarding state is installed at the MP (B). At the PLR (A),

the forwarding state must be set in place to push the label of the backup path (label 201 in the

example). Any number of LSPs crossing link A–B can be protected by the backup shown in the figure.

There is no extra forwarding state for each LSP protected either at the MP or at any of the routers in

the path.

The label that is advertised by the MP is an implicit null label this triggers penultimate hop

popping to be performed for the backup tunnel. Thus, traffic arrives at the MP with the same label with

which it would have arrived over the main LSP.

C

A

D

X YB


Swap 201→200 Pop 200

Push 201

IP102

IP101200

IP1

01

20

1

IP1

01

IP100

Fig. 19 – Forwarding traffic using the facility backup

Fig. 19 depicts an example of forwarding IP traffic using facility backup. To better understand the

process of recovering from a link failure, let us analyse each step of the fixing procedure:

25

Router X is not aware of the failure of link A-B, it pushes the attributed 102 label as usually;

Router A has already detected that link A-B is down, it swaps label 102 to label to label 101,

issued by router B, and pushes label 201 issued by router C to redirect the packet to the

protection tunnel;

Router C acts like if it was a normal LSP and swaps label 201 to label 200 issued by router D.

Label 101 remains untouched;

Router B advertised an implicit null label to router D. Therefore D knows that it needs to pop

the last label for B, to know the origin of that packet. D then pops label 200 and sends the

packet, with the untouched label 101, to router B.

D. One-to-one Backup

In One-to-one backup traffic arrives at the MP with a different label than the one used by the

main (protected) LSP. Fig. 20 shows the setup of a one-to-one backup for the LSP from the previous

example and Fig. 21 shows forwarding over the backup following a failure. Traffic arrives at the MP

with label 300, the backup tunnel label and is forwarded using label 100, the protected LSP label.

Thus, the MP must maintain the forwarding state that associates the backup tunnel label with the

correct label of the protected LSP. If a second LSP were to be protected in this figure, a separate

backup tunnel would be required for it, and a separate forwarding state would be installed at the MP.

C

A

D

X Y

Protected LSP from X to Y

Protection

tunnel for

link A-B

B101

200

102 100

302300


Swap 302→301 Swap 301→300

Swap 300→100

Fig. 20 – Setting up a one-to-one backup

Similar to facility backup, the forwarding state must be set up to map traffic from the protected

LSP into the backup. For example, traffic arriving with label 102, the label of the protected LSP, is

forwarded over the backup using label 302, the backup tunnel label. Note that, using this approach,

the depth of the label stack does not increase when packets are forwarded over the backup path,

because the top label is simply swapped to the backup tunnel label.

26

C

A

D

X YB


Swap 201→200 Pop 200

IP102

IP3

00

IP100

IP3

02

IP300

Fig. 21 – Forwarding traffic over a one-to-one backup

Let us once again go through the link failure recovering steps, but now using one-to-one backup:

Router X unaware of the failure pushes label 102 to an IP packet;

Router A had already detected the failure in link A-B and swaps label 102 to label 302,

previously advertised by router C;

Router C and D act like if it was a normal LSP and remove the arriving label. After that they

add the label advertised by the downstream router;

Finally the packet reaches router B with a label that has been negotiated between B and D,

instead of being negotiated between B and A.

E. Failure Detection

In order to accomplish the restoration goal of 50 ms, we need a way of quickly detecting a

failure, which is out of FRR scope.

Some transmission media provide hardware indications of connectivity loss. One example is

SDH/SONET, which is, as we have seen in the first chapter, widely used in the network backbones.

These technologies have the capability of detecting a failure in the physical layer within seconds.

Some gigabit Ethernet routers have also that capability they use pulse detection to evaluate if the link

is up.

When failure detection is not provided in the hardware, this task can be accomplished by an

entity at a higher layer in the network. As we have seen IGP protocols have a hello exchange

mechanism to detect link failure. The original versions of OSPF and IS-IS had architectural time limits

in the detecting link failure. The minimum time interval used to them was 3 seconds for OSPF and 1

second for IS-IS. Recent versions of the protocols included a fast-hello mechanism that that can be

used to do sub-second failure detection. Nevertheless it is still CPU consuming and causes some

overhead.

27

Based on this realization, the BFD protocol was developed jointly by Juniper and Cisco, having

rapidly gained acceptance. It is a simple hello protocol designed to do rapid failure detection. Its goal

is to provide a low-overhead mechanism that can quickly detect faults in the bidirectional path

between two forwarding engines, whether they are due to problems with the physical interfaces, with

the forwarding engines themselves or with any other component.

29

Chapter 3

The Multicast Routing Problem

1 The Multicast Routing Problem

30

3.1 From Unicast to Multicast

If unicast transmissions were used to distribute TV streams, copies of the same video packets

would pass through the links (Fig. 22). This would cause congestion.

Packet

Stream

Fig. 22 – Packet distribution in unicast. A packet stream is generated for each receiving node.

The problem of having copies of the same packets passing in a single link is solved with the

introduction of multicast. Multicast [15] is the delivery of information to a group of destinations

simultaneously in a transmission from a given source. In a multicast transmission links do not carry

copies of the same packet streams, instead, packets are replicated only when they are needed in

order to reach all the destinations (Fig. 23).

Packet

Stream

Fig. 23 – Packet distribution in multicast.

In this chapter we will talk about some novelties brought by multicast. The most common

31

concept of an IP address is in unicast addressing, available in both IPv4 and IPv6. It normally refers to

a single sender or a single receiver, and can be used for both sending and receiving.

A multicast address is associated with a group of interested receivers. In IPv4, addresses

224.0.0.0 through 239.255.255.255 (the former Class D addresses) are designated as multicast

addresses. IPv6 uses the address block with the prefix ff00::/8 for multicast applications. In either

case, the sender sends a single datagram from its unicast address to the multicast group address and

the intermediary routers take care of making copies and sending them to all receivers that have joined

the corresponding multicast group.

3.2 Protocol Independent Multicast (PIM)

Routing had also changes in multicast. Protocol-Independent Multicast (PIM) [16] is a family of

multicast routing protocols for networks that provide one-to-many and many-to-many distribution of

data over the Internet. PIM routes multicast packets to multicast groups, and is designed to efficiently

establish distribution out-trees. It is termed protocol-independent because PIM does not include its

own topology discovery mechanism, much like RSVP and LDP. Instead it uses the unicast routing

information supplied by other traditional routing protocols, such as IS-IS or OSPF, to perform the

multicast forwarding. Since it relies only on the network layer routing protocols, PIM does not depend

on the IP transport protocols, like TCP and UDP. Nevertheless IP datagrams are still used in data

transmission. There are many PIM variants, the most popular are PIM Dense Mode (PIM-DM), PIM

Sparse Mode (PIM-SM) and PIM Source-Specific Multicast (PIM-SSM).

As far as PIM-DM is concerned, it builds source out-trees by flooding multicast traffic domain

wide, then pruning back branches of the out-tree where no receivers are present. PIM-DM is

straightforward to implement but generally has poor scaling properties and that is why we will not

explore it in this dissertation, since it is not really used for IPTV purposes. We will then be focused on

PIM-SM and PIM-SSM.

A. PIM Sparse Mode (PIM-SM)

In sparse-mode PIM, a router in a group is designated as a rendezvous point (RP). Its choice

can be done statically or automatically. The RP collects information about multicast senders and

makes that information available to potential receivers. The purpose of RP is to allow a receiver to find

out the IP address of the source for a particular group.

Multicast traffic flows from the sender back down the path created by the PIM messages. All

received multicast traffic is checked to verify if the incoming multicast traffic is being received via the

interface, on which PIM request was sent. This prevents loops and duplicate packets.

Each router forms a neighbour relationship with adjacent PIM routers in a group using PIM

32

“hello” messages. When a router in a group wants to receive a multicast stream, it sends a PIM “join”

message towards the RP. The source sends a PIM “register” message to the RP with multicast traffic

already encapsulated on it, as we can see on Fig. 24.

10

10 10

20

A

B

RP192.168.0.1

Multicast group address

234.1.1.1

PIM – “join

234.1.1.1”

PIM – “register

234.1.1.1”

Encapsulated

multicast traffic

Video source

10.0.1.1

C

DPIM – “join

234.1.1.1”R

Fig. 24 – Receiver “join” and source “register” in PIM-SM

The RP then sends a “join” message to the source. As soon as the message is received, the

multicast traffic flows from the source to the receiver (Fig. 25).

10

10 10

20

A

B

RP192.168.0.1


234.1.1.1

PIM – “join

234.1.1.1”

Video source

10.0.1.1

C

DR1

Fig. 25 – Source “join” and traffic flow in PIM-SM

When a router wants to stop receiving a multicast stream, it sends a PIM “prune” message

towards the IP address of the multicast source.

Multicast traffic initially travels from the sender to the receiver via the RP. Once the downstream

router starts receiving the multicast traffic (and knows the senders IP), it is possible to build path

directly back to the sender. This is triggered by the receiver that sends join messages directly to the

33

source. In the example traffic would then flow through links D-C and C-A, that are part of the shortest

path. After this reconfiguration the path between D and A will be the shortest path (Fig. 26).

10

10 10

15

A

B

RP192.168.0.1


234.1.1.1

Video source

10.0.1.1

C

DR1

Fig. 26 – Shortest path reconfiguration in PIM-SM

If another router wanted to join the multicast group, a path would be added, forming an out-tree.

The added path would also be the shortest path between the source and the referred router. An out-

tree formed by concatenation of the shortest paths between the root and the nodes in the out-tree is

designated Shortest Path Tree (SPT) (Fig. 27).

10

10 10

15

A

B

RP192.168.0.1


234.1.1.1

Video source

10.0.1.1

C

DR1

R2

10

Fig. 27 – Shortest Path Tree in PIM-SM after reconfiguration

If the receiver knew the sender’s address, the RP would not be necessary.

34

B. PIM Source Specific Multicast (PIM-SSM)

PIM-SSM [17] requires the edge router to know the address of the multicast source for each

group, avoiding the need of an RP. In PIM-SSM the edge router sends a PIM “join” message directly

towards the sender using the unicast routing table. This “join” request will be forwarded along the

shortest path (in terms of the IGP) to the multicast source until it reaches a router which is already

aware of the multicast group.

10

1010

20

A

B

Video source 2

192.168.1.2

PIM – “join

234.1.1.2”

Group

234.1.1.1

Video source 1

192.168.1.1

Group

234.1.1.2

PIM – “join

234.1.1.1”

C

D

PIM – “join

234.1.1.2”

PIM – “join

234.1.1.1”

Fig. 28 – Source “join” in PIM-SSM. “Join” messages for different group are sent to its

respective video source.

10

1010

20

A

B

Video source 2

192.168.1.2

Group

234.1.1.1

Video source 1

192.168.1.1

Group

234.1.1.2

C

D

Fig. 29 – Data flow in PIM-SSM. Video packets are sent to the reverse packet of the

respective “join” message.

Fig. 28 and Fig. 29 show PIM-SSM in operation, as we can see there is no shared out-tree

sourced at a RP, as in PIM-SM, and there is also no need for any source discovering mechanism. This

is an advantage of PIM-SSM over PIM-SM; the complexity of the multicast routing for PIM-SSM is

35

lower than PIM-SM and therefore easier to configure and maintain. PIM-SSM has also a more efficient

network usage, since traffic is not routed temporarily via the RP and the most direct path from source

to receiver is always used.

Despite the wide use of PIM-SM in IPTV networks, the advantages of the fairly new PIM-SSM

make it the one to be chosen in IP backbone, especially when the range of potential multicast group

addresses is not much large, which is typically the case.

3.3 Point-to-Multipoint LSPs (P2MP LSPs)

The LSPs generated in MPLS for unicast are also different in the multicast case [18]. So far we

have seen how MPLS is used to establish point-to-point LSPs, now we will see the tools that MPLS

has to offer in order to handle multicast connections. Independently of the signalling protocol that is

used, the forwarding mechanisms are similar.

L2 L4

L3

PE2 PE3

P1 B

A

L1

L5

L7

L6

Data

Source

PE1

P2

P3

PE5

PE4

C

Forwarding table on P1:

IN

Label

IN

Interface

OUT

Label

OUT

Interface

L1 C L2 A

L1 C L3 B

Fig. 30 – P2MP LSPs example. The forwarding table of P1 shows the different labels using for

the different interfaces.

Fig. 30 shows an MPLS multicast scenario, as we can see these LSPs have only one ingress

router but several egress routers. PE1, the ingress router, replicates the packets and sends them to

the downstream nodes, always with different labels. The downstream routers act in a similar way; they

36

also replicate the packet and send the copies to the next downstream routers, again, always with

different labels. These routers are called branch nodes. P3 is however a special case, since instead of

replicating the packets, it just transmits them with a new MPLS label. This forwarding mechanism is

quite useful, since it is far more efficient than using unicast LSPs.

RSVP was not prepared to P2MP LSPs as well. However, RSVP got a traffic engineering

(RSVP-TE) extension, allowing it to handle this new type of LSPs. Nevertheless it still uses routing

information of layer 3 protocols; therefore it is still out of the RSVP ambit to construct the out-

branchings.

L2 L4

L3

PE2 PE3

P1

L1

L5

L6

L7

PE1

P2

P3

PE5

PE4

L1

Path messages

Resv messages

L5

Fig. 31 – P2MP LSP setup

Fig. 31 shows the setup of multicast LSPs using RSVP-TE. We can see that like in unicast,

routers running RSVP-TE use Path messages to communicate downstream, and Resv messages to

communicate upstream. In the point of view of MPLS’s control plan a P2MP LSP is just a sum of point-

to-point LSPs.

Path messages have an Explicit Route Object (ERO) which determines the route followed by the

LSP. On the other hand, Resv messages incorporate, for each step, the label that will be used for

forwarding in that step. In the case P2MP networks, each sub-LSP is signalled using its own Path and

Resv messages. For this to happen both type of messages contain a new P2MP Session Object that

identifies the sub-LSP. This mechanism grants the replication of the packets in forwarding.

Let us see how this works in the example network shown in Fig. 31. The represented P2MP LSP

has four egress routers, it is composed of four sub-LSPs, one from PE1 to PE2, another from PE1 to

PE3, and so on. Because each sub-LSP has its own associated Path and Resv messages, on some

links multiple Path and Resv messages are exchanged. For example, the link from PE1 to P1 has

37

Path messages corresponding to the sub-LSPs to PE2 and to PE3. Let us examine in more detail how

the P2MP LSP in Fig. 31 is signalled; looking at sub-LSP PE1 to PE3 whose egress is PE3:

A Path message is sent by PE1, the ingress router, containing the ERO {PE1, P1, P3, PE3}.

This can contain a bandwidth reservation for the P2MP LSP if required.

PE3 responds with a Resv message that contains the label value, L4, that P3 should use

when forwarding packets to PE3.Similarly, the Resv message sent on by P3 to P1 contains

the label value, L3, that P1 should use when forwarding packets to P3.

In a similar way, for the sub-LSP whose egress is PE2, P1 receives a Resv message from

PE2 containing the label value, L2, that P1 should use when forwarding packets to PE2. P1

knows that the Resv messages from PE2 and P3 refer to the same P2MP LSP, as a

consequence of the P2MP Session Object contained in each.

P1 sends a separate Resv message to PE1 corresponding to each of the two sub-LSPs, but

deliberately uses the same label value for each, L1, because the two sub-LSPs belong to the

same P2MP LSP.

P1 installs an entry in its forwarding table such that when a packet arrives with label L1, one

copy is sent on the link to PE2 with label L2 and another copy on the link to P3 with label L3.

This ensures that when the Resv messages are sent from P1 to PE1 corresponding to the two

sub-LSPs, no double-counting occurs of the bandwidth reservation.

PE1, knowing that the two Resv messages received from P1 refer to the same P2MP LSP, a

consequence of the P2MP session object contained in each, forwards only one copy of each

packet in the flow to P1, with the label value L1 that had been dictated by PE1 in those two

Resv messages.

3.4 Protocol Architecture

In order to implement the custom out-branchings presented in chapter 4, we will present a set of

protocols that enable custom designs, can cope with resource reservation and handle fast reroute

restoration. This protocol scheme is based on [3], and uses PIM-SSM combined with P2MP LSPs.

38

Multicast coordination

PIM-SSM

Route Signaling

RSVP-TE

IGP routing

IS-IS/ OSPF

FRR support

MPLS

Routing

info

changes

Link

restoration

Fig. 32 – Cross layer architecture

PIM-SSM is used to coordinate the multicast groups and generate the multicast out-branchings.

IGP routing is assured by either IS-IS or OSPF. P2MP LSPs are established using RSVP to enable

resource reservation and FRR is used for rapid link restoration. Routes are established following the

next steps:

1. The network topology and its current link costs are fed to an out-branching calculation

algorithm that will be presented in the next chapter. The routers can run the programmed

algorithm since they are running an LS protocol which provides them the complete network

topology. The out-branching calculation can also be done by the network operator;

2. After finding the out-branching, we need to re-set the IGP link costs. The cost of the physical

links in the main out-branching remains the same. The cost of the physical links that are not

in the main out-branching are set to (Fig. 33) The new IGP costs are then downloaded to

the routers;

3. This way PIM-SSM will generate the same out-branchings that we have calculated;

4. RSVP signals the out-branchings generated by PIM-SSM, making the proper resource

reservations, forming a P2MP LSP.

5. The backup paths for each link of out-branching, can be calculated by the network operator

and downloaded to the routers. Alternatively, RSVP can setup them automatically using the

IGP routing information. To do such, IGP costs need to be re-set again. The cost of the

physical links of the main out-branching are set to and the costs of the other physical links

are set to its original value. Doing such grants that the backup paths that are calculated do

not share any link with main out-branching.

6. If the backup paths were calculated using RSVP, the IGP costs need to be set to the values

of step 2;

39

3.5 Failure Recovering Process

In point 5.1 we have described a procedure to assure the setup of the pre calculated out-

branchings. We have also presented a way of calculating and setting up protection paths. Fig. 33

shows a network that has already been set.

R2

5

1

3

4

6

3

2

R4

S

High Quality and Resilient IPTV Backbone Networks André ... · vi. vii Abstract Resumo Distribuir...

Documents

Transcript of High Quality and Resilient IPTV Backbone Networks André ... · vi. vii Abstract Resumo Distribuir...