High Quality and Resilient IPTV Backbone Networks André ... · vi. vii Abstract Resumo Distribuir...
Transcript of High Quality and Resilient IPTV Backbone Networks André ... · vi. vii Abstract Resumo Distribuir...
-
High Quality and Resilient IPTV Backbone Networks
André Ricardo Simões Bento
Dissertation submitted for obtaining the degree of
Master in Electrical and Computer Engineering
Jury
Supervisor: Prof. João Luís da Costa Campos Gonçalves Sobrinho
President: Prof. Nuno Cavaco Gomes Horta
Members: Prof. Joao Jose de Oliveira Pires
Outubro 2012
-
ii
To my grandmother
-
iii
Acknowledgements
Acknowledgements
Foremost, I would like to express my sincere gratitude to my supervisor Prof. João Luís Sobrinho
for the continuous support of my M.Sc. study and research, for his patience, motivation and
enthusiasm. His guidance helped me in all the time of research and writing of this thesis.
Besides my advisor, I would like to thank my family, especially my parents, who supported me
emotionally and financially all the way through the course.
Last but not the least, I would like to thank my beloved girlfriend for her patience to see me
writing this thesis throughout the summer and during her vacations.
-
iv
-
v
Abstract
Abstract
Multicasting of TV (real time video) streams with the IP protocol suite is a challenge. IP was
initially designed for unicast not multicast. The routing and transport protocols of IP were not planned
for applications requiring a combination of steady high bandwidth, low packet delay and low packet
loss.
This thesis focuses on the design of broadcast trees for the backbone of IPTV networks. We will
present algorithms for the calculation of optimal out-branchings and resilient out-branchings, where in
case of link failure, it is possible to substitute the link by a backup path, granting that the other links of
the tree are not affected by the failure. The failure recovering is provided by Fast Reroute
mechanisms, which are capable of handling failure in sub-second time (around 50ms).
Keywords
IPTV, Backbone, Minimum Out-Branchings, Resilient Out-Branchings, Fast Reroute, MPLS
-
vi
-
vii
Abstract
Resumo
Distribuir fluxos de TV (vídeo em tempo real) usando os protocolos IP para transmissões ponto-
multiponto é um desafio. O IP foi inicialmente desenhado para transmissões ponto a ponto, não
ponto-multiponto. Os protocolos de transporte e encaminhamento do IP não foram planeados para
aplicações que requerem-se uma combinação de largura de banda estável, baixo atraso e baixa
perda de pacotes.
Esta tese foca-se no desenho de arborescências de distribuição para o núcleo das redes IPTV.
Apresentamos algoritmos para o cálculo de arborescências óptimas e arborescências resilientes,
onde, no caso de falha de uma ligação é possível substituir a ligação que falhou, por um caminho de
protecção, garantindo que as outras ligações da arborescência não são afectadas pela mudança. A
recuperação das falhas é feita usando Fast Reroute, um mecanismo capaz de lidar com falhas em
tempos que rondam os 50 ms.
Palavras-Chave
IPTV, Núcleo, Arborescências de Custo Mínimo, Arborescências Resilientes, Fast Reroute, MPLS
-
viii
-
ix
Table of Contents
Table of Contents
Acknowledgements ................................................................................ iii
Abstract ................................................................................................... v
Resumo ................................................................................................ vii
Table of Contents ................................................................................... ix
List of Figures ........................................................................................ xi
List of Tables ......................................................................................... xiii
List of Algorithms .................................................................................. xv
List of Acronyms .................................................................................. xvii
1 Introduction .................................................................................. 1
1.1 Problem Statement .................................................................................. 2
1.2 Contents .................................................................................................. 3
1.3 Contributions ........................................................................................... 3
1.4 State of the Art ......................................................................................... 4
1.5 IPTV Architecture .................................................................................... 5
2 The Unicast Routing Problem ....................................................... 9
2.1 Network Service Models ........................................................................ 10
2.2 Routing Protocols .................................................................................. 11
2.3 MPLS ..................................................................................................... 13
2.4 Fast Reroute .......................................................................................... 20
3 The Multicast Routing Problem .................................................. 29
3.1 From Unicast to Multicast ...................................................................... 29
3.2 Protocol Independent Multicast (PIM) .................................................... 31
3.3 Point-to-Multipoint LSPs (P2MP LSPs) ................................................. 35
3.4 Protocol Architecture ............................................................................. 37
3.5 Failure Recovering Process................................................................... 39
-
x
4 Designing Multicast Out-branchings ........................................... 43
4.1 Definitions .............................................................................................. 44
4.2 Minimum Cost Out-Branchings .............................................................. 46
4.3 Connectivity Theorems .......................................................................... 51
4.4 Link-Disjoint Out-branchings Algorithms ................................................ 56
4.5 Comparison of the Out-branchings ........................................................ 70
4.6 Protection Paths .................................................................................... 73
5 Conclusions ................................................................................ 75
References............................................................................................ 79
-
xi
List of Figures
List of Figures Fig. 1 – IPTV architecture ......................................................................................................................... 5
Fig. 2 – Structure of the backbone of an IPTV network ........................................................................... 6
Fig. 3 – Changes in case of link failure. The link represented in bold is introduced to substitute the link that failed (represented with a cross) ................................................................. 6
Fig. 4 – Verizon’s continental US backbone network (2011) [31] ............................................................ 7
Fig. 5 – T-Systems (aka Deutsche Telekom) germany50 [26] backbone network (2006) ....................... 8
Fig. 6 – Layer model ...............................................................................................................................10
Fig. 7 – Various Virtual Circuits established over a Physical Link: one per packet stream ...................11
Fig. 8 – MPLS layer model .....................................................................................................................13
Fig. 9 – Label Switching in MPLS. A packet with label 50 arrives at LSR at interface B. The LSR looks at its label switching table a sends the packet with label 40 through interface C. ....................................................................................................................14
Fig. 10 – MPLS’s header structure .........................................................................................................15
Fig. 11 – MPLS tunnel ............................................................................................................................15
Fig. 12 – LDP session establishment .....................................................................................................17
Fig. 13 – Path signalling using RSVP .....................................................................................................19
Fig. 14 – Constrain based routing example that uses bandwidth as a routing metric ...........................20
Fig. 15 – Network Failure Anatomy ........................................................................................................21
Fig. 16 – Local Protection .......................................................................................................................22
Fig. 17 – End-to-end protection ..............................................................................................................23
Fig. 18 – Setting up a facility backup ......................................................................................................24
Fig. 19 – Forwarding traffic using the facility backup .............................................................................24
Fig. 20 – Setting up a one-to-one backup ..............................................................................................25
Fig. 21 – Forwarding traffic over a one-to-one backup ..........................................................................26
Fig. 22 – Packet distribution in unicast. A packet stream is generated for each receiving node. ..........30
Fig. 23 – Packet distribution in multicast. Packet streams are not replicated at the source. .................30
Fig. 24 – Receiver “join” and source “register” in PIM-SM .....................................................................32
Fig. 25 – Source “join” and traffic flow in PIM-SM ..................................................................................32
Fig. 26 – Shortest path reconfiguration in PIM-SM ................................................................................33
Fig. 27 – Shortest Path Tree in PIM-SM after reconfiguration ...............................................................33
Fig. 28 – Source “join” in PIM-SSM. “Join” messages for different group are sent to its respective video source. ...............................................................................................34
Fig. 29 – Data flow in PIM-SSM. Video packets are sent to the reverse packet of the respective “join” message. .............................................................................................................34
Fig. 30 – P2MP LSPs example. The forwarding table of P1 shows the different labels using for the different interfaces. .................................................................................................35
Fig. 31 – P2MP LSP setup .....................................................................................................................36
Fig. 32 – Cross layer architecture ..........................................................................................................38
Fig. 33 – Setup of calculated out-branchings .........................................................................................39
Fig. 34 – Link failure scenario ................................................................................................................40
Fig. 35 – Minimum cost out-branching (bold red arrows) .......................................................................46
Fig. 36 - Algorithm to find MOBs: Step I (link cost update) ....................................................................47
-
xii
Fig. 37 - Algorithm to find MOBs: Step II (0 cost cycle is agglomerated in a supernode) .....................48
Fig. 38 - Algorithm to find MOBs: Step III ...............................................................................................48
Fig. 39 - Algorithm to find MOBs: Step IV (0 cost links are selected as part of the MOB) .....................49
Fig. 40 - Algorithm to find MOBs: Step V (the links of the cycle except link are added to the MOB) .............................................................................................................................49
Fig. 41 – Link-disjoint example part I ......................................................................................................51
Fig. 42 – Link-disjoint example part II .....................................................................................................51
Fig. 43 – Two symmetric cycles. There are two disjoint paths between 0 and 1. ..................................52
Fig. 44 – Closed Path using the 2 links of .................................................................................53
Fig. 45 – Closed path using the 3 links of .................................................................................53
Fig. 46 – Simple path between sets and ......................................................................................54
Fig. 47 – 2-connected directed graph .....................................................................................................55
Fig. 48 – Graph with 2 link-disjoint out-branchings. One represented in normal red lines, the other in dashed blue lines .............................................................................................55
Fig. 49 – Edmond’s theorem application example. Set and have 3 and 2 entering links represented in bold, respectively. .................................................................................56
Fig. 50 – First iteration of algorithm 2: Part I (path is added to out-branching ) ...............................57
Fig. 51– First iteration of algorithm 2: Part II (path is added to out-branching ) ..............................57
Fig. 52 – First iteration of algorithm 2: Part III (the reverse edges of path are added to ) ...............58
Fig. 53 – First iteration of algorithm 2: Part IV (the reverse edges of path are added to ) ...............58
Fig. 54 – Final result of the application of algorithm 2 ............................................................................59
Fig. 55 – Algorithm 2 iteration: Part I (there are always two paths available from the two out-branchings to node ) .............................................................................................61
Fig. 56 – Algorithm 2 iteration: Part II ........................................................................................61
Fig. 57 – Choosing paths in algorithm 2: Part I (path - - is added to out-branching ) .................62
Fig. 58 – Choosing paths in algorithm 2: Part II (path - - - is added to out-branching ) ...........63
Fig. 59 – Choosing paths in algorithm 2: Part III (the reverse edges of the paths are added to the out-branchings) .......................................................................................................63
Fig. 60 – Choosing paths in algorithm 2: Part IV (the result of not choosing the shortest paths for out-branching ) ...........................................................................................................64
Fig. 61 – Graph and the maximum number of disjoint out-branchings ............................................66
Fig. 62 – First iteration of algorithm 4: Part I ..........................................................................................66
Fig. 63 – First iteration of algorithm 4: Part II .........................................................................................67
Fig. 64 – First iteration of algorithm 4: Part III ........................................................................................67
Fig. 65 – Final result of the application of algorithm 4 ............................................................................68
Fig. 66 – 3 out-branchings rooted at vertex 0 generated by our implementation of algorithm 4 ............70
Fig. 67 – 10 node link-disjoint trees generated using algorithm 4. (Main tree in Red) ...........................72
Fig. 68 – 10 node link-disjoint trees generated using algorithm 2 optimized for the main tree (Red) .............................................................................................................................72
-
xiii
List of Tables
List of Tables
Table 1 – Router S IGP routing table .....................................................................................................40
Table 2 – Algorithm comparison .............................................................................................................71
Table 3 – Protection path cost comparison ............................................................................................74
-
xiv
-
xv
List of
List of Algorithms
Algorithm 1 - Finding a MOB rooted at in a given multigraph ........................................................49
Algorithm 2: Finding 2 link disjoint out-branchings rooted at in a 2-connected symmetric graph .........................................................................................................................59
Algorithm 3: Finding the maximum number of link-disjoint paths in from to . ..............................65
Algorithm 4: Finding a maximal set of link disjoint out-branchings in rooted at .............................68
-
xvi
-
xvii
List of
List of Acronyms
AS Autonomous System
ATM Asynchronous Transfer Mode
BBC British Broadcasting Corporation
BFD Bidirectional Forwarding Detection
BFS Breadth First Search
CBS Columbia Broadcasting System
CNN Cable News Network
CPU Central Processing Unit
DSL Digital Subscriber Line
DV Distance Vector
ERO Explicit Route Object
FEC Forward Equivalence Classes
FRR Fast Reroute
IGP Interior Gateway Protocols
IP Internet Protocol
IPTV Internet Protocol Television
IPv4 Internet Protocol version 4
IPv6 Internet Protocol version 6
IS-IS Intermediate Systems to Intermediate Systems
ISP Internet Service Providers
LDP Label Distribution Protocol
LER Label Edge Router
LIB Label Information Base
LS Link State
LSA Link State Advertisements
LSP Label Switched Path
LSR Label Switched Routers
MOB Minimum Out-Branching
MP Merge Point
MPLS Multi Protocol Label Switching
MPLS-TE Multi Protocol Label Switching-Traffic Engineering
MST minimum spanning tree
OSPF Open Shortest Path First
P2MP Point to Multipoint
-
xviii
PIM Protocol Independent Multicast
PIM-DM Protocol Independent Multicast-Dense Mode
PIM-SM Protocol Independent Multicast-Sparse Mode
PIM-SSM Protocol Independent Multicast-Source Specific Mode
PLR Point of Local Repair
PON Passive Optical Networks
QoS Quality of Service
RIP Routing Information Protocol
RP Rendezvous Point
RRO Record Route Object
RSVP Resource Reservation Protocol
RSVP-TE Resource Reservation Protocol-Traffic Engineering
SHO Super Hub Office
SPT Shortest Path Tree
TCP Transmission Control Protocol
TE Traffic Engineering
TTL Time to Live
TV Television
UDP User Datagram Protocol
UK United Kingdom
US United States
VC Virtual Circuit
VHO Video Hub Offices
VOD Video On Demand
VSO Video Serving Office
-
1
Chapter 1
Introduction
1 Introduction
-
2
1.1 Problem Statement
In the past few years, we have seen the telecommunication carriers do a global effort in the
rapid implementation of Internet Protocol Television (IPTV). IPTV encodes live TV streams in a flow of
IP packets and delivers them to users through broadband networks. The key reasons for using IP
networks for the transport of TV are:
Convergence: The services offered by the telecommunications companies are converging on
IP, including the transmission of digital voice and data. The addition of IPTV allows
telecommunication carriers to strengthen their competitiveness by offering triple play services
(digital voice, data and TV);
Cost: Having a uniform platform reduces the cost of maintaining separate networks;
Flexibility: IP has the ability to support diverse applications; this offers flexibility opening up
opportunities for the introduction of new services like interactive TV and Video on Demand
(VOD).
telecommunication carriers use their IP networks to distribute the TV streams; this presents
challenges because the networks were not designed for this type of service and IPTV requires:
Minimal Packet Loss: IP packet loss can represent various levels of image damage, ranging
from a single pixelated block in an image, to a long period of degraded, pixelated or
unavailable sequence of images. In order to offer the user a satisfactory Quality of Service
(QoS) packet loss needs to be minimal.
Steady High Bandwidth: The total amount of video-stream data that can be sent is ultimately
limited by the bandwidth provisioned over the network. Any increase in bandwidth demand
that is above the maximum capacity of the link, will result in video packets being lost.
Low Packet Delay: Live TV streams are encoded and sent in IP packets. At the reception, an
initial delay is introduced and the first packets are stored in a buffer. After the initial delay,
packets are decoded and transmitted in the user’s TV. This mechanism helps in coping with
short transmission delays and jitter. However, if a packet fails to arrive at the reception at the
time it is supposed to be decoded, it is like it was lost since it will not be used.
Sub-second Failure Restoration: In the case of link failure, restoration needs to be done in
sub-second time to avoid packet loss, which could cause image damage or even service
interruption.
-
3
1.2 Contents
In this dissertation we will evaluate network designs that can cope with the challenging
requirements of IPTV. We will start with an overview of the IPTV backbone architecture and its
components.
In chapter 2: The Unicast Routing Problem, we will discuss the network service models and
protocols used to forward traffic sent from a single source to a single location in the network. We will
also address the resource reservation problem and refer to a link-failure protection mechanism called
Fast Reroute, which is capable of providing sub-second restoration.
The emergence of applications like IPTV led to introduction of routing protocols capable of
forwarding traffic to a group of recipients. The changes in routing, from a source to multiple
destinations, as opposed to routing from a source to a single destination, are addressed in chapter 3:
The Multicast Routing Problem. We will see the out-trees that are currently being used for forwarding
and analyse mechanisms to calculate them. An out-tree is a directed tree rooted at a specific source
vertex. If the out-tree spans all the vertices of a given graph is called out-branching.
In the end of chapter 3, we will also discuss possible protocol suites that can cope with the
requirements of IPTV and simultaneously accommodate custom out-branchings. We will then describe
a methodology to handle link failure. The purpose of this methodology is to avoid packet loss when
using the Fast Reroute mechanisms.
In chapter 4: Designing Out-branchings, we will present the custom out-branching designs. We
will present algorithms to calculate optimal out-branchings and resilient out-branchings. In the end of
the chapter we will compare the designs and evaluate its pros and cons. Finally, in chapter 5:
Conclusion, we will do a short summary of the topics covered by this thesis and discuss future work to
be done in this subject.
1.3 Contributions
The objective of this thesis is to compute and implement different out-branching designs,
understanding the advantages and disadvantages of these implementations. Given a certain out-
branching we purpose a protocol suite that can use this out-branching to transport IPTV packets and,
at the same time, is capable of handling the requirements of IPTV. We have studied the interaction
between the involved protocols and based on previous work ([1], [2], [3] and [4]), developed a strategy
to operate them.
We have also studied different out-branching designs:
Shortest path out-branchings, where the cost of the paths from the root to the each vertex is
minimum;
-
4
Minimum cost out-branchings, where the sum of the costs of the links in the out-branching is
minimum;
Resilient out-branchings, where each link of the out-branching can be protected by a path
that is disjoint from the out-branching. If the protection paths shared links with the main tree,
these links would transport more than packet stream replicas, this could cause link overload.
We have implemented algorithms that generate the out-branchings above. Regarding the
resilient out-branchings we will present 2 algorithms. Given a certain graph, one of them generates 2
disjoint out-branchings and the other generates the maximum number of link-disjoint out-branchings
possible. For the first, we have come up with 2 heuristics that improve the resultant out-branchings.
The purpose of the heuristics is the following:
Improve the main out-branching: where the objective is to reduce the link cost of the paths
between the root and the vertices in the main out-branching;
Improve for the backup paths: where the objective is to reduce the total link cost of paths
used for backup.
As far as the protection paths are concerned, we also purpose a way of calculating them.
1.4 State of the Art
A group of authors ([1], [2], [3] and [4]), suggested a protocol suite capable of accommodate
custom out-branching designs, as well as a failure handling method used to avoid packet loss during
the Fast Reroute recovering process . The authors have also suggested an algorithm to calculate out-
branchings, and respective disjoint protection paths. This algorithm took care of link overload problem
in case failure, however, the authors were not preoccupied with the cost of the out-branchings. They
also did not had in consideration the cost of the protection paths. Other authors came up with
algorithms [5] that resulted in similar out-trees, also disregarding the cost optimization aspect.
In [6] and [7] simple schemes are proposed to solve single link failures in sub-second time with
low packet loss in the process, these schemes introduce minor changes in routing and are easy to
implement. However, these schemes disregard the link overload problem during the recovery. Like the
others the proposed solutions do not account for the costs of the out-branchings.
Another different solution is proposed in [8], this scheme uses fast layer 3 failure detection, this
type of schemes is often disregarded by the telecommunications carriers because they require short
timers to detect the failures, this can produce false positive cases and as we will see further ahead
can compromise the best routing options.
Finally in [9], 3 protocol architectures are discussed with the objective of distributing IPTV
streams, the advantages and disadvantages of the architectures are briefly explained.
-
5
1.5 IPTV Architecture
IPTV networks are usually built on top of existent IP networks and are separated in three areas:
access, metro and backbone. These areas have different components and philosophies. Fig. 1 shows
a typical IPTV architecture.
SHO VHO VSOPON
Access
Cable
Access
DSL
Access
Backbone Metro Acess
Super Hub
Office
Video Hub
Office
Video Serving
Office
National Channel
Servers
Local Channel
Servers
Fig. 1 – IPTV architecture
In the backbone, there are two essential components: the Super Hub Office (SHO) and the
Video Hub Offices (VHOs). The SHO gathers the TV streams of the national channels (channels that
are not specific to a region or state, like CNN in the US, or BBC One in the UK) and distributes them to
a large set of receiving locations, the VHOs. Each VHO gathers the TV streams of the local channels
(channels that are only reproduced in a specific region like WCBS the CBS channel of New York) and
feeds both national and local streams to a metro area. IP routers are used to transport the IPTV
content from the SHO to the VHOs. In addition, both SHOs and VHOs are enabled with powerful
buffer servers to enable retransmissions and some other Video-On-Demand (VOD) and interactive
services. Most of them have a satellite used to receive local/national contents.
Is at the VHO, that traffic leaves the core network. It distributes the contents to Video Serving
Offices (VSOs) via the routers in the metro area. The VSOs establish the border between metro and
access networks, than can either be Cable, Digital Line Subscriber (DSL), or Passive Optical
Networks (PON).
The focus of this thesis is on the backbone of the IPTV networks, which are supported by the
telecommunication carriers’ already existent IP networks. Fig. 2 shows an example of the backbone of
an IP network that supports IPTV. The bold green circle represent de SHO and dotted and dashed
green circles represent de VHOs. Both SHOs and VHOs incorporate two routers so that in case of
router failure, the network is connected in a way, which grants that there is always one router available
to feed the VSOs. The black lines are undirected edges; every edge has an upstream and
-
6
downstream link. The normal black lines represent edges that have a link used in the out-branching.
This out-branching, rooted at the SHO, is depicted by the red arrows that represent links. The black
dashed lines are unused edges. They are used to substitute a link of the out-branching in case of
failure.
SHO
VHO
VHO
VHO
VHO
VHO
VHO
SHO
VHO
Router
Used Edge
Unused Edge
Link of the
broadcast out-tree
Fig. 2 – Structure of the backbone of an IPTV network
SHO
VHO
VHO
VHO
VHO
VHO
VHO
SHO
VHO
Router
Used Edge
Unused Edge
Link of the
broadcast out-tree
Fig. 3 – Changes in case of link failure. The link represented in bold is introduced to substitute
the link that failed (represented with a cross)
Most developed countries have their own IP backbone networks. In general, there are few big
telecommunication carriers per country, each one supporting its own national backbone network. The
-
7
main difference between these networks is size.
When compared to the US’s, the European networks are generally smaller. Nevertheless major
IPTV carriers in US and in Deutschland, Verizon and T-Systems, respectively, support IP backbone
networks that are about the same size. These networks are represented in Fig. 4 and Fig. 5.
Fig. 4 – Verizon’s continental US backbone network (2011) [10]
-
8
Fig. 5 – T-Systems (aka Deutsche Telekom) germany50 [11] backbone network (2006)
As we can see Verizon’s network has 47 nodes and T-Systems’ has 50 nodes. We will focus our
work in this kind of networks: big IPTV backbone networks.
Instead of analysing European countries individually, we could also picture it as a whole, but
mind that in Europe, carriers distribute IPTV contents in and for its own country. This explains the
existence of an IPTV backbone area per country per telecommunication carrier, to where video
contents should be derived and then distributed.
-
9
Chapter 2
The Unicast Routing Problem
2 The Unicast Routing Problem
-
10
This chapter overviews the network service models and routing protocols used in unicast routing.
Unicast is the term that designates the sending of messages from a single source to a single
destination in the network. We will also talk about Multi Protocol Label Switching and Fast Reroute.
2.1 Network Service Models
Perhaps the most important abstraction provided by the network layer to the upper layers is
whether it uses virtual circuits (VCs) or datagrams [12].
Layer 2
Data-link
Layer 1
Physical
...
Layer 3
Network
Layer 4
Transport
Layer 2
Data-link
Layer 1
Physical
Layer 3
Network
Layer 4
Transport
...
Sender Receiver
Fig. 6 – Layer model
VCs provide a connection oriented service through the establishment and teardown of pseudo
(virtual) circuits. This mechanism enables the reservation of link resources, providing steady high
bandwidth, low delay and low packet loss. Nevertheless, VCs require state information to be recorded
in the routers for each packet stream.
-
11
Physical
Link
Virtual Circuit
Ph
ysic
al
Lin
k
Ph
ysic
al
Lin
k
Fig. 7 – Various Virtual Circuits established over a Physical Link: one per packet stream
In contrast, datagrams do not require state information to be recorded in the routers for each flow
of data, since data transmission is not preceded by circuit establishment. IP uses datagrams; it has
low complexity and therefore handles heterogeneous networks best. IP uses resources evenly through
any number of users, giving them no guarantees whatsoever. Due to its lack of guarantees, IP service
is often called a best-effort service.
As far as IPTV is concerned, we can see that VCs would be suitable for video transmission,
however IP uses datagrams. Multi Protocol Label Switching is an answer to this dichotomy. MPLS is a
label switching mechanism that allows the traffic to be divided into forwarding classes which may be
forwarded the same way. The use of MPLS together with the Resource Reservation Protocol (RSVP)
enables the reservation of resources to a specific data flow type (eg. Video), keeping only state
information relative to the flow type, not the flow itself. MPLS is fully compatible with IP and in fact
RSVP uses the IP routing protocol information to do the reservation of resources. We will talk about
MPLS and RSVP later, for now, we will explore the IP routing protocols, from which RSVP depends
on.
2.2 Routing Protocols
In this dissertation we are particularly interested on the best way of using the resources of a
particular operator. Given so, we are then only interested in routing within a given autonomous system
(AS). An intra-AS routing protocol is used to maintain the forwarding tables in the routers of an AS.
Once the routing tables are configured, datagrams are routed within the AS. Intra-AS routing protocols
-
12
are also known as Interior Gateway Protocols (IGP). IGP can be divided into two categories: Distance-
Vector (DV) protocols and Link-State (LS) protocols.
The name distance-vector is derived from the fact that routes are advertised as vectors of
(distance, direction), where distance is defined in terms of link cost and direction is defined in terms of
the next-hop router. Each router learns routes from its neighbouring routers and then advertises the
paths from its own perspective. A typical distance vector routing protocol is a routing algorithm in
which routers periodically send routing updates to all neighbours by broadcasting their entire routing
tables. The algorithm used to calculate the shortest paths is distributed and because of that routing
loops may occur when using a DV protocol [13].
In a LS protocol, the network topology and all link costs are known in each router. This is
accomplished by having each node broadcast the identities and costs of its attached links to all other
routers in the network. This link state broadcast can be accomplished without the routers having to
initially know the topology of the network. A node only needs to know the identities and costs to its
directly-attached neighbours; it will then learn about the topology of the rest of the network by
receiving link state broadcast messages from the other nodes. The result of the nodes' broadcast is
that all nodes have an identical and complete view of the network. Each router can then run a Dijkstra
algorithm to calculate the shortest paths to the other routers. Any link cost changes or failure
occurrences are communicated through the network using Link State Advertisements (LSAs).
The key differences between distance vector and link state routing algorithms are the following:
Convergence time: the computation time to calculate the shortest paths in DV protocols take
usually take longer than in LS protocols. The distributed computations can result in cycles
that can extend the computation time;
State information: LS protocols require the routers to store state information of the whole
network, in other words routers need to know the complete topology of the network. DV
protocols only need to store information related to their neighbours.
The fact that in LS routing protocols, routers have complete knowledge of the network topology
is useful if we want to design custom out-branchings. If the routers know the topology they can use it
to calculate the out-branchings independently, reaching the same results without having the necessity
of exchanging messages between the routers. The most popular LS protocols are Open Shortest Path
First (OSPF) and Intermediate Systems to Intermediate Systems (IS-IS). They are similar; the
operator’s choice is usually based on convenience issues, not technical aspects.
Another advantage of using LS protocols is its low convergence time. This is especially useful
upon failure, in order to recalculate the shortest paths in the new topology (after failure). IS-IS and
OSPF use a message exchange mechanism to detect failures. They keep HELLO messages between
nodes to evaluate their condition; they use timers to detect the non-reception of a scheduled HELLO
packet. When a failure is detected, routers distribute this info through the network using LSAs. Note
that the choice of the timers is a tricky question, short timers may cause false positive failure
detection, on the other hand, long timers may cause slow failure detection and in consequence,
-
13
packet loss. The failure detection times depend on the timers, but in common implementations range
from 1 to 10s. Late extensions of OSPF and IS-IS offer a Fast-Hello mechanism that is able to detect
failure in less than 1s.
After the application of the LS protocol the resultant shortest path may be used to forward data
packets. However if we use MPLS we can choose alternate paths, other than the shortest.
2.3 MPLS
Multi-Protocol Label Switching (MPLS) [14] began in the mid-1990s and appeared mainly to
solve problems on IP over Asynchronous Transfer Mode (ATM) networks, which used VCs to forward
the IP datagrams. MPLS has generally four main objectives:
Improving scalability using labels to aggregate state information and to do routing
hierarchies;
Improving routing flexibility, using the labels to identify traffic with specific necessities to
construct personalised paths;
Optimizing network performance if used globally;
Simplifying router integration with switching technology by forcing switches to act like
routers, reporting physical topology information to the network layer and by unifying the
routing, control and addressing of layers 2 and 3.
Multi protocol label switching operates at a layer that is generally considered to lie between
traditional definitions of layer 2 (data link layer) and layer 3 (network layer), and thus is often referred
to as a "layer 2.5" protocol. It was designed to provide a unified data-carrying service for both circuit-
based clients and packet-switching clients which provide a datagram service model. It can be used to
carry many different kinds of traffic, including IP packets.
IP
ATM / Frame Relay / Ethernet/
...
DSL / 10BASE-T / OTN/ ...
Layer 3 - Network Layer
Layer 2 - Data-link layer
Layer 1 - Physical Layer
...
MPLS Layer “2,5" - MPLS
Fig. 8 – MPLS layer model
This label switching mechanism brought by MPLS consists on the addition of a short extra label
-
14
in each IP packet. In a MPLS network, like the one represented in Fig. 9, a packet enters a MPLS
domain by a Label Edge Router (LER), which introduces a MPLS header in the packet and sends it to
the core of the network. This header contains the label that has forwarding information to be used by
the core routers named Label Switched Routers (LSR). When the packet reaches the end of a path (to
an exit LER), the label is removed. This path is called Label Switched Path (LSP). The labels are
stored in a label switching table, where each entry contains a pair incoming label, incoming interface
and the corresponding pair containing the out-coming label and its destination interface.
LSR
DLER
(Ingress)
LER
(Egress)
LSP
A
BC
B 50 C 40
IN
Int
IN
Label
OUT
IntOUTLabel
D50
IP
40 IP
Fig. 9 – Label Switching in MPLS. A packet with label 50 arrives at LSR at interface B. The LSR looks
at its label switching table a sends the packet with label 40 through interface C.
Labels have a local character and can be distributed using the Label Switching Protocol (LDP) or
the Resource Reservation Protocol (RSVP), which we will refer to later in this chapter.
Fig. 10 indicates the structure of MPLS’s header. It is formed by:
Label – Packets are expedited based on this field. This field will be used to index MPLS’s
expedition table, the Label Information Base (LIB);
EXP – originally meant to be experimental, are now used to classify packets into service
classes;
S – Labels can be stacked. S is a bit used to identify the bottom of the stack.
TTL – Time to Live field is used to avoid cycles and infinite packet redistribution.
-
15
Data-link Layer Header MPLS Header IP Header
Label20 bit
EXP3 bit
S1 bit
TTL8 bit
Fig. 10 – MPLS’s header structure
Since it is not possible to have a “personalized” treatment to each packet, they need to be
grouped in classes. These are called Forward Equivalence Classes (FEC). Each class contains
packets with same transport requirements. This concept is inherited from IP. However, in MPLS, it is
possible to have much more traffic classes. The most common factors used to classify the packet into
FECs are the packet stream destination, packet stream origin and content type (e.g. video). All
packets of a given FEC may be bound to the same MPLS label, which are expedited the same way. In
contrast with IP expedition, the attribution of FEC to a packet is only done once (you can change a
packets FEC by adding a second label to the packet).
Labels are the essential element for packet expedition. A LSR just needs to examine the label to
know which will be the next node, to where the packet needs to be expedited to. Labels are local, this
means they only identify univocally connections between LSRs.
A. Label Stacking
It is possible to insert more than one label per IP packet. This enables the existence of
hierarchies in the LSPs.
R3 R4 R5
R1
R2 R7
R66
617
1317
13
621
1321
Fig. 11 – MPLS tunnel
-
16
Fig. 11 shows an example of this hierarchy. In this example R1 sends a packet with label 6 to
R3, which is the entry router of an MPLS tunnel. R3 receives the packet and instead of switching label
6, it adds another label: label 17. R3 then sends the packet to R4 that only evaluates the first label of
the label stack, 17. R4 switches the label 17 with label 21 and forwards the packet to R5. This is an
egress LSR, it removes label 21 from the stack and evaluates the inner labels. R5 then sends the
packet without MPLS labels to the respective router, according to its forwarding table.
With label stacking, R1 does not need to know the configuration of the MPLS tunnel. It just
needs to know the entry and exit routers. On the other hand, R4 does not need to know the
configuration outside the MPLS tunnel.
There are two protocols that usually used to attribute labels to the packet streams: the Label
Distribution Protocol (LDP) and the Resource Reservation Protocol (RSVP).
B. LDP: Label Distribution Protocol
LDP is the result of the MPLS Working Group in the IETF. LDP was made for MPLS unlike
RSVP, which existed before and was extended to handle label distribution. Nevertheless, LDP is much
used, especially when optimizing the network is not a vital issue. Typically it is used in metro or local
networks. In other cases RSVP is the one to go with.
LDP has the advantage of not requiring any manual configuration; nodes interact between each
other in order to discover new LSPs automatically. Each LSR establishes a TCP session with its
neighbours to run the protocol, this grants reliable message delivery. With this setup we can see that
each router has only information from its neighbours, this limits the labelling options but enhances
scalability.
The operation of LDP is driven by message exchanges between peers. Potential peers, also
known as neighbours, are automatically discovered via hello messages multicast to a well-known UDP
port. The protocol enables the discovery of remote peers using targeted hello messages. Once a
potential peer is discovered, a TCP connection is established to it and an LDP session is set up. At
session initialization time, the peers exchange information regarding the features and mode of
operation they support. After session setup, the peers exchange information regarding the binding
between labels and FECs over the TCP connection. As we have referred, the use of TCP ensures
reliable delivery of the information, but it also allows the use of incremental updates, rather than
periodic refreshes. LDP uses the regular receipt of protocol messages to monitor the health of the
session. In the absence of any new information that needs to be communicated between the peers,
keep-alive messages are sent. Fig. 12 shows LDP session establishment.
-
17
Fig. 12 – LDP session establishment
LDP does its routing based on the information it gets from the running IGP protocol. Having an
IGP running below LDP is therefore essential. The association between an FEC and a label is
advertised via label messages: label mapping messages for advertising new labels, label withdraw
messages for withdrawing previously advertised labels. The fundamental LDP rule states that LSR A
that receives a mapping for label L for FEC F from its LDP peer LSR B will use label L for forwarding if
and only if B is on the IGP shortest path for destination F from A’s point of view. This means that LSPs
setup via LDP always follows the IGP shortest path and that LDP uses the IGP to avoid loops.
Given our necessities, LDP by its own, does not seem to be the way to go. It is a highly scalable
and simple protocol but it fails when we try to get optimization out of it. It is not to make full path
reservations or have a good traffic engineering process, since it does its routing automatically and
based on IGP information. Therefore we shall evaluate our other choice, RSVP, which gives us better
tools to treat video flows.
C. RSVP
Another scheme for distributing labels for transport LSPs is based on the Resource Reservation
Protocol (RSVP). RSVP was invented before MPLS. It was originally developed as a scheme to create
bandwidth reservations for individual traffic flows in networks (e.g. a video telephony session between
a particular pair of hosts) in an attempt to bring QoS to IP. RSVP includes mechanisms for reserving
bandwidth along each hop of a network for an end-to-end session. However, the original QoS
application of RSVP has fallen out of favor because of concerns about its scalability: the number of
end-to-end host sessions passing across a service provider network would be extremely large, and it
would not be desirable for the routers within the network to have to create, maintain and tear down
state as sessions come and go.
In the context of MPLS, however, RSVP has been extended to allow it to be used for the
creation and maintenance of LSPs and to create associated bandwidth reservations. When used in
this context, the number of RSVP sessions in the network is much smaller because of the way in
-
18
which traffic is aggregated into an LSP. A single LSP requires only one RSVP session, yet can carry
all the traffic between two LER, containing many end-to-end packet streams. MPLS also allows the
traffic to be aggregated into FECs, through the simple label switching mechanism, reducing the
number of existent packet streams.
In contrast to LDP, a RSVP-signalled LSP may not necessarily follow the path dictated by the
IGP and, in its extended form, it is possible for the ingress router to specify the entire end-to-end path
that the LSP must follow, having full control over the path. This property is very important in the
context of traffic protection schemes such as Fast Reroute, discussed in detail lately in this chapter.
RSVP requests resources for a traffic stream in only one direction from sender to one or more
receivers and maintains soft state (the reservation at each node needs a periodic refresh) of the host
and routers' resource reservations, hence supporting dynamic automatic adaptation to network
changes.
Paths may be computed online by the router or offline using a path computation tool. In the case
of online computation, typically only the ingress router needs to be aware of any constraints to be
applied to the LSP. Moreover, use of the explicit routes eliminates the need for all the routers along
the path to have a consistent routing information database and a consistent route calculation
algorithm.
The creation of an RSVP-signalled LSP is initiated by the ingress LER. The ingress LER sends
an RSVP Path message. The destination address of the Path message is the egress LER. The Path
message includes the IP address of the previous node and some data objects, the most relevant are:
Label Request Object - Requests a MPLS label for the path. As a consequence, the egress
and transit routers allocate a label for their section of the LSP.
Explicit Route Object (ERO) - The ERO contains the addresses of nodes through which the
LSP must pass. If required, the ERO can contain the entire path that the LSP must follow from
the ingress to the egress.
Record Route Object (RRO) - RRO requests that the path followed by the Path message (and
hence by the LSP itself once it is created) be recorded. Each router through which the Path
message passes adds its address to the list within the RRO. A router can detect routing loops
if it sees its own address in the RRO.
Sender TSpec - TSpec enables the ingress router to request a bandwidth reservation for the
LSP in question.
In response to the Path message, the egress router sends a Resv message. Note that the
egress router addresses the message to the adjacent router upstream, rather than addressing it
directly to the source. This triggers the upstream router to send a Resv message to its upstream
neighbour and so on. As far as each router in the path is concerned, the upstream neighbour is the
router from which it received the Path message. This scheme ensures that the Resv message follows
the exact reverse path of the Path message.
Here are some of the objects contained in a Resv message:
-
19
Label Object - Contains the label to be used for that section of the LSP.
Record Route Object (RRO) - Records the path taken by the Resv message, in a similar way
to the RRO carried by the Path message. Again, a router can detect routing loops if it sees its
own address in the Record Route Object.
As can be seen, RSVP Path and Resv messages need to travel hop-by-hop because they need
to establish the state at each node they cross, e.g. bandwidth reservations and label setup.
R3 R4
R5R1
R2
Path messages
Resv messages
200
300 100
Fig. 13 – Path signalling using RSVP
Fig. 13 shows an example of the establishment of a path. R1 needs to setup a data packet
stream with specific QoS to R5. To accomplish it, R1 sends a Path message to R3, asking for a label
for a given data flow. R3 then redirects the Path message to router R4 and R4 redirects it to R5. R1
needs to resend this Path message every 30 seconds to refresh the state.
When the message reaches its destination, router R5, it allocates a label for that flow locally and
responds with a Resv message to R4. In this case label 100 was chosen. As soon as the Resv
message reaches R4, the router must first make a reservation based on the request parameters and
then forward the request upstream (in the direction of the sender). This process is repeated in the
upstream routers.
The routers store the characteristics of the packet stream, and also police it to grant the
specified QoS. This is all done in soft state, so if nothing is heard for a certain length of time, then the
reservation will be cancelled. This solves the problem if either the sender or the receiver crash or are
shut down incorrectly without first cancelling the reservation.
D. Traffic Engineering
Traffic engineering is the process of routing data traffic to balance the traffic load on the various
links, routers, and switches in the network and is most applicable in networks where multiple parallel
or alternate paths are available. Fundamentally, the objective of traffic engineering is to ensure
sufficient capacity exists to handle the forecast demand from the different service classes while
-
20
meeting their respective QoS objectives.
Along the years many protocols got Traffic Engineering (TE) extensions to better cope with
nowadays traffic demands, MPLS was no different.
MPLS-TE brought with it the concept of constrain based routing, a change in shortest path
concept. If all the traffic is forwarded through the same “shortest paths”, paths may become
congested. Perhaps we can get better results distribute the traffic wisely between the paths available.
Constraint-based routing is a mechanism used to meet TE requirements, which can take in to
account more than one metric to define its paths. Besides link cost, it can use bandwidth, delay and
jitter.
1
R2
R6 R7
R1
R3 1
R5
40Mbit/s traffic
70Mbit/s traffic
40Mbit/s bandwidth
100Mbit/s bandwidth
1 1
1
R4
Fig. 14 – Constrain based routing example that uses bandwidth as a routing metric
Fig. 14 shows an example of constrain based routing that uses bandwidth as a routing metric. If
short path first routing was used in this case, both traffic flows from R1 to R5 would pass through path
R1-R2-R3-R4-R5, which has the lowest total combined cost. This is the worst path in terms of
bandwidth and will have difficulties in handling the traffic.
2.4 Fast Reroute
Fast Reroute (FRR) [14] is a technology that offers resiliency to MPLS networks. With FRR it is
possible to quickly recover from link failure. The medium time down time when a single fails is around
50 ms, which is far better than convergence times of the IGP protocols.
FRR uses pre-programmed backup paths to detour the packets from the affected links, offering a
safe path that can quickly be introduce once failure is detected. These procedures are transparent to
the IGP protocols and therefore will leave the layer 3 routing tables intact.
-
21
R1 R6R5R2
R3 R4
PLR MP
Fig. 15 – Network Failure Anatomy
Fig. 15 shows the anatomy of a failure in a given network. If connection R2-R5 fails, traffic will be
redirected through path R2-R3-R4-R5. The point that is upstream of the failure is called Point of Local
Repair (PLR); on the other hand, the point downstream of the failure is called Merge Point (MP).
FRR gives us essentially two schemes for protecting a given LSP. We can protect a full path
(end-to-end) or just do a local protection detour (hop-by-hop). Both have its advantages and
disadvantages and although local protection is the most popular, they can be used complementarily.
A. Local Protection
The idea of local protection is simple. Instead of protecting the full path, the traffic is only
detoured at the PLR, protecting only one link. This is pretty similar to what happens when a freeway
between two cities closes down somewhere between exits A and B. Instead of redirecting the traffic
through other national routes, vehicles are redirected to a detour at exit A, which leads them to point B
where they enter the freeway once again. Following the freeway’s philosophy, it is necessary to create
what we call detour path as we can see in Fig. 16.
-
22
R8
R1
R3
R2
R4
R9
S
R5
D
LSP from S to D
Detour
Fig. 16 – Local Protection: If link R1R2 fails traffic is detoured at the PLR (R1), returning to the LSP at
the MP (R2)
B. End-to end Protection
End-to end protection consists on having a secondary LSP protecting a primary LSP (Fig. 17).
This LSP is an end-to-end path that redirects traffic between two nodes that are typically a few hops
apart in the network. This is usually done offline given that it is normally necessary that the involved
nodes have full network knowledge, which may not be available to all routers (which may be in
different Link State areas or just not running any Link State protocol at all).
On the other hand, this scheme of protection offers advantages as well. Since the backup paths
normally are callibrated by the network manager, it is possible to have better QoS guarantees.
Another advantage is that no matter what link(s) fail in the primary LSP, it is always possible to protect
it. Path signaling is usually done by RSVP.
-
23
LSP2
secondary
LSP1 primary
R1
R2 R3
DS
R4
Fig. 17 – End-to-end protection
Both in local and end-to-end protection, the protection paths must be computed and signalled
before a failure happens. We have seen in MPLS that flag S indicated the end of a stack of labels. We
stack an extra label on top of another usually to prevent the following router to access the inner label.
This encapsulation mechanism forms a LSP tunnel; these tunnels can be used for protection. There
are two methods for doing protection LSPs, which differ in the label with which the traffic arrives at the
MP. This in turn influences the number of LSPs that can be protected by a single backup tunnel,
yielding either N:1 (facility backup) or 1:1 (one-to-one backup).
C. Facility Backup
In facility backup, a single protection LSP is used to protect many primary LSPs. Traffic arrives
over the backup tunnel with the same label as it would if it arrived over the failed link. The only
difference from the point of view of forwarding is that, when arriving to the MP, it may come from
different interfaces. Normally, traffic comes from the protected LSP, but in failure cases, traffic may
travel through the protection tunnel. In both cases, traffic reaches the MP with the same label.
To ensure that traffic arrives at the MP with the correct label, all that needs to be done is to
tunnel it into the backup by pushing the backup tunnel label on top of the protected LSP label at the
PLR (label stacking) and remove the first label on the stack at the penultimate node before reaching
the MP (penultimate hop popping). The ability for several LSPs to share the same protection path is
an important scaling property of facility backup.
-
24
C
A
D
X Y
Protected LSP from X to Y
Protection
tunnel for
link A-B
B101
200
102 100
201 Pop
Push 102 Swap 102→101 Swap 101→100 Pop 100
Swap 201→200 Pop 200
Fig. 18 – Setting up a facility backup
Fig. 18 shows the setup of the backup tunnel before the failure and the forwarding state that is
installed at every hop in the path. No new forwarding state is installed at the MP (B). At the PLR (A),
the forwarding state must be set in place to push the label of the backup path (label 201 in the
example). Any number of LSPs crossing link A–B can be protected by the backup shown in the figure.
There is no extra forwarding state for each LSP protected either at the MP or at any of the routers in
the path.
The label that is advertised by the MP is an implicit null label this triggers penultimate hop
popping to be performed for the backup tunnel. Thus, traffic arrives at the MP with the same label with
which it would have arrived over the main LSP.
C
A
D
X YB
Push 102 Swap 102→101 Swap 101→100 Pop 100
Swap 201→200 Pop 200
Push 201
IP102
IP101200
IP1
01
20
1
IP1
01
IP100
Fig. 19 – Forwarding traffic using the facility backup
Fig. 19 depicts an example of forwarding IP traffic using facility backup. To better understand the
process of recovering from a link failure, let us analyse each step of the fixing procedure:
-
25
Router X is not aware of the failure of link A-B, it pushes the attributed 102 label as usually;
Router A has already detected that link A-B is down, it swaps label 102 to label to label 101,
issued by router B, and pushes label 201 issued by router C to redirect the packet to the
protection tunnel;
Router C acts like if it was a normal LSP and swaps label 201 to label 200 issued by router D.
Label 101 remains untouched;
Router B advertised an implicit null label to router D. Therefore D knows that it needs to pop
the last label for B, to know the origin of that packet. D then pops label 200 and sends the
packet, with the untouched label 101, to router B.
D. One-to-one Backup
In One-to-one backup traffic arrives at the MP with a different label than the one used by the
main (protected) LSP. Fig. 20 shows the setup of a one-to-one backup for the LSP from the previous
example and Fig. 21 shows forwarding over the backup following a failure. Traffic arrives at the MP
with label 300, the backup tunnel label and is forwarded using label 100, the protected LSP label.
Thus, the MP must maintain the forwarding state that associates the backup tunnel label with the
correct label of the protected LSP. If a second LSP were to be protected in this figure, a separate
backup tunnel would be required for it, and a separate forwarding state would be installed at the MP.
C
A
D
X Y
Protected LSP from X to Y
Protection
tunnel for
link A-B
B101
200
102 100
302300
Push 102 Swap 102→101 Swap 101→100 Pop 100
Swap 302→301 Swap 301→300
Swap 300→100
Fig. 20 – Setting up a one-to-one backup
Similar to facility backup, the forwarding state must be set up to map traffic from the protected
LSP into the backup. For example, traffic arriving with label 102, the label of the protected LSP, is
forwarded over the backup using label 302, the backup tunnel label. Note that, using this approach,
the depth of the label stack does not increase when packets are forwarded over the backup path,
because the top label is simply swapped to the backup tunnel label.
-
26
C
A
D
X YB
Push 102 Swap 102→302 Swap 300→100 Pop 100
Swap 201→200 Pop 200
IP102
IP3
00
IP100
IP3
02
IP300
Fig. 21 – Forwarding traffic over a one-to-one backup
Let us once again go through the link failure recovering steps, but now using one-to-one backup:
Router X unaware of the failure pushes label 102 to an IP packet;
Router A had already detected the failure in link A-B and swaps label 102 to label 302,
previously advertised by router C;
Router C and D act like if it was a normal LSP and remove the arriving label. After that they
add the label advertised by the downstream router;
Finally the packet reaches router B with a label that has been negotiated between B and D,
instead of being negotiated between B and A.
E. Failure Detection
In order to accomplish the restoration goal of 50 ms, we need a way of quickly detecting a
failure, which is out of FRR scope.
Some transmission media provide hardware indications of connectivity loss. One example is
SDH/SONET, which is, as we have seen in the first chapter, widely used in the network backbones.
These technologies have the capability of detecting a failure in the physical layer within seconds.
Some gigabit Ethernet routers have also that capability they use pulse detection to evaluate if the link
is up.
When failure detection is not provided in the hardware, this task can be accomplished by an
entity at a higher layer in the network. As we have seen IGP protocols have a hello exchange
mechanism to detect link failure. The original versions of OSPF and IS-IS had architectural time limits
in the detecting link failure. The minimum time interval used to them was 3 seconds for OSPF and 1
second for IS-IS. Recent versions of the protocols included a fast-hello mechanism that that can be
used to do sub-second failure detection. Nevertheless it is still CPU consuming and causes some
overhead.
-
27
Based on this realization, the BFD protocol was developed jointly by Juniper and Cisco, having
rapidly gained acceptance. It is a simple hello protocol designed to do rapid failure detection. Its goal
is to provide a low-overhead mechanism that can quickly detect faults in the bidirectional path
between two forwarding engines, whether they are due to problems with the physical interfaces, with
the forwarding engines themselves or with any other component.
-
28
-
29
Chapter 3
The Multicast Routing Problem
1 The Multicast Routing Problem
-
30
3.1 From Unicast to Multicast
If unicast transmissions were used to distribute TV streams, copies of the same video packets
would pass through the links (Fig. 22). This would cause congestion.
Packet
Stream
Fig. 22 – Packet distribution in unicast. A packet stream is generated for each receiving node.
The problem of having copies of the same packets passing in a single link is solved with the
introduction of multicast. Multicast [15] is the delivery of information to a group of destinations
simultaneously in a transmission from a given source. In a multicast transmission links do not carry
copies of the same packet streams, instead, packets are replicated only when they are needed in
order to reach all the destinations (Fig. 23).
Packet
Stream
Fig. 23 – Packet distribution in multicast.
In this chapter we will talk about some novelties brought by multicast. The most common
-
31
concept of an IP address is in unicast addressing, available in both IPv4 and IPv6. It normally refers to
a single sender or a single receiver, and can be used for both sending and receiving.
A multicast address is associated with a group of interested receivers. In IPv4, addresses
224.0.0.0 through 239.255.255.255 (the former Class D addresses) are designated as multicast
addresses. IPv6 uses the address block with the prefix ff00::/8 for multicast applications. In either
case, the sender sends a single datagram from its unicast address to the multicast group address and
the intermediary routers take care of making copies and sending them to all receivers that have joined
the corresponding multicast group.
3.2 Protocol Independent Multicast (PIM)
Routing had also changes in multicast. Protocol-Independent Multicast (PIM) [16] is a family of
multicast routing protocols for networks that provide one-to-many and many-to-many distribution of
data over the Internet. PIM routes multicast packets to multicast groups, and is designed to efficiently
establish distribution out-trees. It is termed protocol-independent because PIM does not include its
own topology discovery mechanism, much like RSVP and LDP. Instead it uses the unicast routing
information supplied by other traditional routing protocols, such as IS-IS or OSPF, to perform the
multicast forwarding. Since it relies only on the network layer routing protocols, PIM does not depend
on the IP transport protocols, like TCP and UDP. Nevertheless IP datagrams are still used in data
transmission. There are many PIM variants, the most popular are PIM Dense Mode (PIM-DM), PIM
Sparse Mode (PIM-SM) and PIM Source-Specific Multicast (PIM-SSM).
As far as PIM-DM is concerned, it builds source out-trees by flooding multicast traffic domain
wide, then pruning back branches of the out-tree where no receivers are present. PIM-DM is
straightforward to implement but generally has poor scaling properties and that is why we will not
explore it in this dissertation, since it is not really used for IPTV purposes. We will then be focused on
PIM-SM and PIM-SSM.
A. PIM Sparse Mode (PIM-SM)
In sparse-mode PIM, a router in a group is designated as a rendezvous point (RP). Its choice
can be done statically or automatically. The RP collects information about multicast senders and
makes that information available to potential receivers. The purpose of RP is to allow a receiver to find
out the IP address of the source for a particular group.
Multicast traffic flows from the sender back down the path created by the PIM messages. All
received multicast traffic is checked to verify if the incoming multicast traffic is being received via the
interface, on which PIM request was sent. This prevents loops and duplicate packets.
Each router forms a neighbour relationship with adjacent PIM routers in a group using PIM
-
32
“hello” messages. When a router in a group wants to receive a multicast stream, it sends a PIM “join”
message towards the RP. The source sends a PIM “register” message to the RP with multicast traffic
already encapsulated on it, as we can see on Fig. 24.
10
10 10
20
A
B
RP192.168.0.1
Multicast group address
234.1.1.1
PIM – “join
234.1.1.1”
PIM – “register
234.1.1.1”
Encapsulated
multicast traffic
Video source
10.0.1.1
C
DPIM – “join
234.1.1.1”R
Fig. 24 – Receiver “join” and source “register” in PIM-SM
The RP then sends a “join” message to the source. As soon as the message is received, the
multicast traffic flows from the source to the receiver (Fig. 25).
10
10 10
20
A
B
RP192.168.0.1
Multicast group address
234.1.1.1
PIM – “join
234.1.1.1”
Video source
10.0.1.1
C
DR1
Fig. 25 – Source “join” and traffic flow in PIM-SM
When a router wants to stop receiving a multicast stream, it sends a PIM “prune” message
towards the IP address of the multicast source.
Multicast traffic initially travels from the sender to the receiver via the RP. Once the downstream
router starts receiving the multicast traffic (and knows the senders IP), it is possible to build path
directly back to the sender. This is triggered by the receiver that sends join messages directly to the
-
33
source. In the example traffic would then flow through links D-C and C-A, that are part of the shortest
path. After this reconfiguration the path between D and A will be the shortest path (Fig. 26).
10
10 10
15
A
B
RP192.168.0.1
Multicast group address
234.1.1.1
Video source
10.0.1.1
C
DR1
Fig. 26 – Shortest path reconfiguration in PIM-SM
If another router wanted to join the multicast group, a path would be added, forming an out-tree.
The added path would also be the shortest path between the source and the referred router. An out-
tree formed by concatenation of the shortest paths between the root and the nodes in the out-tree is
designated Shortest Path Tree (SPT) (Fig. 27).
10
10 10
15
A
B
RP192.168.0.1
Multicast group address
234.1.1.1
Video source
10.0.1.1
C
DR1
R2
10
Fig. 27 – Shortest Path Tree in PIM-SM after reconfiguration
If the receiver knew the sender’s address, the RP would not be necessary.
-
34
B. PIM Source Specific Multicast (PIM-SSM)
PIM-SSM [17] requires the edge router to know the address of the multicast source for each
group, avoiding the need of an RP. In PIM-SSM the edge router sends a PIM “join” message directly
towards the sender using the unicast routing table. This “join” request will be forwarded along the
shortest path (in terms of the IGP) to the multicast source until it reaches a router which is already
aware of the multicast group.
10
1010
20
A
B
Video source 2
192.168.1.2
PIM – “join
234.1.1.2”
Group
234.1.1.1
Video source 1
192.168.1.1
Group
234.1.1.2
PIM – “join
234.1.1.1”
C
D
PIM – “join
234.1.1.2”
PIM – “join
234.1.1.1”
Fig. 28 – Source “join” in PIM-SSM. “Join” messages for different group are sent to its
respective video source.
10
1010
20
A
B
Video source 2
192.168.1.2
Group
234.1.1.1
Video source 1
192.168.1.1
Group
234.1.1.2
C
D
Fig. 29 – Data flow in PIM-SSM. Video packets are sent to the reverse packet of the
respective “join” message.
Fig. 28 and Fig. 29 show PIM-SSM in operation, as we can see there is no shared out-tree
sourced at a RP, as in PIM-SM, and there is also no need for any source discovering mechanism. This
is an advantage of PIM-SSM over PIM-SM; the complexity of the multicast routing for PIM-SSM is
-
35
lower than PIM-SM and therefore easier to configure and maintain. PIM-SSM has also a more efficient
network usage, since traffic is not routed temporarily via the RP and the most direct path from source
to receiver is always used.
Despite the wide use of PIM-SM in IPTV networks, the advantages of the fairly new PIM-SSM
make it the one to be chosen in IP backbone, especially when the range of potential multicast group
addresses is not much large, which is typically the case.
3.3 Point-to-Multipoint LSPs (P2MP LSPs)
The LSPs generated in MPLS for unicast are also different in the multicast case [18]. So far we
have seen how MPLS is used to establish point-to-point LSPs, now we will see the tools that MPLS
has to offer in order to handle multicast connections. Independently of the signalling protocol that is
used, the forwarding mechanisms are similar.
L2 L4
L3
PE2 PE3
P1 B
A
L1
L5
L7
L6
Data
Source
PE1
P2
P3
PE5
PE4
C
Forwarding table on P1:
IN
Label
IN
Interface
OUT
Label
OUT
Interface
L1 C L2 A
L1 C L3 B
Fig. 30 – P2MP LSPs example. The forwarding table of P1 shows the different labels using for
the different interfaces.
Fig. 30 shows an MPLS multicast scenario, as we can see these LSPs have only one ingress
router but several egress routers. PE1, the ingress router, replicates the packets and sends them to
the downstream nodes, always with different labels. The downstream routers act in a similar way; they
-
36
also replicate the packet and send the copies to the next downstream routers, again, always with
different labels. These routers are called branch nodes. P3 is however a special case, since instead of
replicating the packets, it just transmits them with a new MPLS label. This forwarding mechanism is
quite useful, since it is far more efficient than using unicast LSPs.
RSVP was not prepared to P2MP LSPs as well. However, RSVP got a traffic engineering
(RSVP-TE) extension, allowing it to handle this new type of LSPs. Nevertheless it still uses routing
information of layer 3 protocols; therefore it is still out of the RSVP ambit to construct the out-
branchings.
L2 L4
L3
PE2 PE3
P1
L1
L5
L6
L7
PE1
P2
P3
PE5
PE4
L1
Path messages
Resv messages
L5
Fig. 31 – P2MP LSP setup
Fig. 31 shows the setup of multicast LSPs using RSVP-TE. We can see that like in unicast,
routers running RSVP-TE use Path messages to communicate downstream, and Resv messages to
communicate upstream. In the point of view of MPLS’s control plan a P2MP LSP is just a sum of point-
to-point LSPs.
Path messages have an Explicit Route Object (ERO) which determines the route followed by the
LSP. On the other hand, Resv messages incorporate, for each step, the label that will be used for
forwarding in that step. In the case P2MP networks, each sub-LSP is signalled using its own Path and
Resv messages. For this to happen both type of messages contain a new P2MP Session Object that
identifies the sub-LSP. This mechanism grants the replication of the packets in forwarding.
Let us see how this works in the example network shown in Fig. 31. The represented P2MP LSP
has four egress routers, it is composed of four sub-LSPs, one from PE1 to PE2, another from PE1 to
PE3, and so on. Because each sub-LSP has its own associated Path and Resv messages, on some
links multiple Path and Resv messages are exchanged. For example, the link from PE1 to P1 has
-
37
Path messages corresponding to the sub-LSPs to PE2 and to PE3. Let us examine in more detail how
the P2MP LSP in Fig. 31 is signalled; looking at sub-LSP PE1 to PE3 whose egress is PE3:
A Path message is sent by PE1, the ingress router, containing the ERO {PE1, P1, P3, PE3}.
This can contain a bandwidth reservation for the P2MP LSP if required.
PE3 responds with a Resv message that contains the label value, L4, that P3 should use
when forwarding packets to PE3.Similarly, the Resv message sent on by P3 to P1 contains
the label value, L3, that P1 should use when forwarding packets to P3.
In a similar way, for the sub-LSP whose egress is PE2, P1 receives a Resv message from
PE2 containing the label value, L2, that P1 should use when forwarding packets to PE2. P1
knows that the Resv messages from PE2 and P3 refer to the same P2MP LSP, as a
consequence of the P2MP Session Object contained in each.
P1 sends a separate Resv message to PE1 corresponding to each of the two sub-LSPs, but
deliberately uses the same label value for each, L1, because the two sub-LSPs belong to the
same P2MP LSP.
P1 installs an entry in its forwarding table such that when a packet arrives with label L1, one
copy is sent on the link to PE2 with label L2 and another copy on the link to P3 with label L3.
This ensures that when the Resv messages are sent from P1 to PE1 corresponding to the two
sub-LSPs, no double-counting occurs of the bandwidth reservation.
PE1, knowing that the two Resv messages received from P1 refer to the same P2MP LSP, a
consequence of the P2MP session object contained in each, forwards only one copy of each
packet in the flow to P1, with the label value L1 that had been dictated by PE1 in those two
Resv messages.
3.4 Protocol Architecture
In order to implement the custom out-branchings presented in chapter 4, we will present a set of
protocols that enable custom designs, can cope with resource reservation and handle fast reroute
restoration. This protocol scheme is based on [3], and uses PIM-SSM combined with P2MP LSPs.
-
38
Multicast coordination
PIM-SSM
Route Signaling
RSVP-TE
IGP routing
IS-IS/ OSPF
FRR support
MPLS
Routing
info
changes
Link
restoration
Fig. 32 – Cross layer architecture
PIM-SSM is used to coordinate the multicast groups and generate the multicast out-branchings.
IGP routing is assured by either IS-IS or OSPF. P2MP LSPs are established using RSVP to enable
resource reservation and FRR is used for rapid link restoration. Routes are established following the
next steps:
1. The network topology and its current link costs are fed to an out-branching calculation
algorithm that will be presented in the next chapter. The routers can run the programmed
algorithm since they are running an LS protocol which provides them the complete network
topology. The out-branching calculation can also be done by the network operator;
2. After finding the out-branching, we need to re-set the IGP link costs. The cost of the physical
links in the main out-branching remains the same. The cost of the physical links that are not
in the main out-branching are set to (Fig. 33) The new IGP costs are then downloaded to
the routers;
3. This way PIM-SSM will generate the same out-branchings that we have calculated;
4. RSVP signals the out-branchings generated by PIM-SSM, making the proper resource
reservations, forming a P2MP LSP.
5. The backup paths for each link of out-branching, can be calculated by the network operator
and downloaded to the routers. Alternatively, RSVP can setup them automatically using the
IGP routing information. To do such, IGP costs need to be re-set again. The cost of the
physical links of the main out-branching are set to and the costs of the other physical links
are set to its original value. Doing such grants that the backup paths that are calculated do
not share any link with main out-branching.
6. If the backup paths were calculated using RSVP, the IGP costs need to be set to the values
of step 2;
-
39
3.5 Failure Recovering Process
In point 5.1 we have described a procedure to assure the setup of the pre calculated out-
branchings. We have also presented a way of calculating and setting up protection paths. Fig. 33
shows a network that has already been set.
R2
5
1
3
4
6
3
2
R4
S