张国强 - Divakth.diva-portal.org/smash/get/diva2:345510/FULLTEXT01.pdf · experimentally that...

Thesis for the degree of Doctor of Philosophy

Robust Multimedia Communications over

Packet Networks

Guoqiang Zhang

张国强

Sound and Image Processing LaboratorySchool of Electrical Engineering

KTH (Royal Institute of Technology)

Stockholm 2010

Zhang, GuoqiangRobust Multimedia Communications over Packet Networks

Copyright c©2010 Guoqiang Zhang except whereotherwise stated. All rights reserved.

ISBN 978-91-7415-705-5TRITA-EE 2010:036ISSN 1653-5146

Sound and Image Processing LaboratorySchool of Electrical EngineeringKTH (Royal Institute of Technology)SE-100 44 Stockholm, SwedenTelephone + 46 (0)8-790 8819

Abstract

Multimedia communications over packet networks, and in particular the voice over IP(VoIP) application, have become an integral part of society. However, the unreliableand heterogeneous nature of packet networks has led to a best-effort delivery of services.Delay, limitation of bandwidth, and packet-loss rate all affect the quality of service (QoS).In this thesis, we address two important network impairments in the design of robustmultimedia communication systems: packet delay-variation and packet-loss.

Paper A considers the mitigation of the effect of packet delay-variation for audiocommunications by introducing a buffer at the receiver side. A new adaptive playoutscheduling approach is proposed to control the buffering length, or, equivalently, thepacket playout deadlines, in response to varying network conditions. A Wiener processis used to model the fluctuation of the buffering length without any playout adjustment.The playout scheduling problem is then reformulated as a stochastic impulse controlproblem by taking the playout adjustment as the control signal. The proposed approachis shown to be the optimal solution to the new control problem. It is demonstratedexperimentally that the proposed approach provides improved perceived conversionalquality.

Papers B, C and D address the packet-loss issue. Paper B focuses on the design of alow-complexity packet-loss concealment (PLC) method that is compatible with existingspeech codecs for VoIP application. The new method is rigorously motivated based onthe autoregressive (AR) speech model and the minimum mean squared error (MMSE)criterion. The effect of model estimation error on the prediction of the missing speechsegment is also considered and an upper bound for the prediction error is derived. Boththe theoretical and experimental results provide insight in the performance of the heuris-tically designed PLC methods. On the other hand, Paper C and D consider an activepacket-loss-resilient coding scheme, namely multiple description coding (MDC). In gen-eral, MDC can be used for the transmission of any media data. Paper C derives a simpleand accurate approximation of the rate-distortion lower bound of a particular multiple-description scenario and then demonstrates that the performance loss of some practicalMD systems can be evaluated easily with the new approximation. Paper D studies theperformance limit of a vector Gaussian multiple description scenario. An outer boundto the rate-distortion region is derived, and the outer bound is tight when the problemspecializes to the scalar Gaussian case.

Keywords: playout scheduling, VoIP, autoregressive model, packet-loss conceal-ment, multiple description coding, vector quantization, rate-distortion region.

i

List of Papers

The thesis is based on the following papers:

[A] G. Zhang, G. Dan, W. Bastiaan Kleijn and H. F. Lundin, “Adap-tive Playout Scheduling for Voice over IP: Event-Triggered Con-trol Policy,” IEEE Trans. Multimedia, submitted, June 2010.

[B] G. Zhang and W. B. Kleijn, “Autoregressive Model-BasedSpeech Packet-Loss Concealment,” In Proc. ICASSP, pp. 4797–4800, Mar. 2008.

[C] G. Zhang, J. Østergaard, J. Klejsa and W. B. Kleijn, “High-Rate Analysis of Symmetric L-Channel Multiple DescriptionCoding,” IEEE Trans. Communications, submitted, May 2010.

[D] G. Zhang, W. Bastiaan Kleijn and J. Østergaard, “Bounding theRate Region of Vector Gaussian Multiple Descriptions with In-dividual and Central Receivers,” IEEE Trans. Inform. Theory,submitted, May 2010.

iii

In addition to papers A-D, the following papers have also beenproduced in part by the author of the thesis:

[1] H. Li, G. Zhang, and W. B. Kleijn, “Playout Scheduling forVoIP Using k-Erlang Distribution,” in Proceedings of EuropeanSignal Processing Conference, 2010.

[2] G. Zhang, W. B. Kleijn and J. Østergaard, “Bounding the RateRegion of Vector Gaussian Multiple Descriptions with Individualand Central Receivers,” in IEEE Data Compression Conference,pp. 13-19, 2010.

[3] G. Zhang, J. Klejsa, and W. B. Kleijn, “Analysis of K-ChannelMultiple Description Quantization,” in IEEE Data CompressionConference, pp. 53-62, 2009.

[4] G. Zhang, J. Klejsa, and W. B. Kleijn, “Asymptotic Analysis ofMultiple Description Lattice Vector Quantization,” in Workshopon Information Theoretic Methods in Science and Engineering,August 2008, Tampere, Finland.

[5] G. Zhang, H. Lundin, and W. B. Kleijn, “Band Control Pol-icy of Playout Scheduling for Voice over IP,” in Proceedings ofEuropean Signal Processing Conference, August 2008.

iv

Acknowledgements

During my Ph.D. studies I received a lot of help from many people. First Iwould like to take this opportunity to thank my supervisor Prof. BastiaanKleijn. You have always been supportive to my work and have done yourutmost to guide me, which I deeply appreciate. Your creativity, dedicationand professionalism inspired me in both work and life.

I devote special thanks to Prof. Gyorgy Dan, Dr. Jan Østergaard,Haopeng Li, Janusz Klejsa and Henrik Lundin for work resulting in jointpublications. It is an enjoyable experience to work with you all.

I would also like to thank all my colleges at the Sound and Image Pro-cessing lab. Thank you for creating a pleasant working environment. Inparticular, I am thankful to: Anders Ekman for helping me in preparing tu-torials in Information theory and Source Coding course; Prof. Markus Flierlfor valuable research discussions; Prof. Arne Leijon for proof-reading mythesis; my office-mate Janusz Klejsa for so many interesting conversationsabout culture, history, life, and so on; Jan Plasberg and Konrad Hofbauerfor teaching me to play table football.

Thanks to my Chinese friends at KTH, particularly David, Zhanyu,Zhongwei, Xi Zhang, Jing Fu, Minyue, Jingfen and Lei Bao. I am glad toget to know you. I will always remember the fun we have had together.

I would like to express my deep gratitude to my family, my parents foryour endless support and your sacrifice; my two sisters for the joy we have.Last but certainly not least, thank you Xiaoying for your understandingand encouragement.

Guoqiang Zhang, Stockholm, 2010

v

Contents

Abstract i

List of Papers iii

Acknowledgements v

Contents vii

Acronyms xi

I Introduction 11 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . 11.2 Research Challenges . . . . . . . . . . . . . . . . . . 3

2 Approaches Addressing Delay-Jitter . . . . . . . . . . . . . 62.1 Overview of Playout Scheduling for Audio Communi-

cation . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Overview of Playout Scheduling for Video Communi-

cation . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Approaches Addressing Packet-Loss . . . . . . . . . . . . . 8

3.1 Packet-Loss Concealment . . . . . . . . . . . . . . . 93.2 Multiple Description Coding . . . . . . . . . . . . . 10

4 Summary of Contributions . . . . . . . . . . . . . . . . . . . 24References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

II Included papers 39

A Adaptive Playout Scheduling for Voice over IP: Event-Triggered Control Policy A11 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . A12 Dejitter Buffer Modelling . . . . . . . . . . . . . . . . . . . A3

vii

2.1 Wiener Process Modelling . . . . . . . . . . . . . . . A4

2.2 Impulse Control Adjustment . . . . . . . . . . . . . A4

3 Impulse Control of the Dejitter Buffer . . . . . . . . . . . . A6

3.1 Problem Formulation . . . . . . . . . . . . . . . . . A6

3.2 A Band Control Policy and its Performance . . . . . A8

4 Optimal Control Policy . . . . . . . . . . . . . . . . . . . . A11

4.1 Lower Bound over All Impulse Control Policies . . . A11

4.2 Optimality of a Band Control Policy . . . . . . . . . A11

5 Implementing the Band Control Policy . . . . . . . . . . . . A13

5.1 Negative Jump of the Buffer Length . . . . . . . . . A14

5.2 Positive Jump of the Buffer Length . . . . . . . . . . A14

5.3 Packet loss and out-of-order arrivals . . . . . . . . . A15

5.4 Summarizing The Band-Control Algorithm . . . . . A16

5.5 Selection of the Upper Bound . . . . . . . . . . . . . A17

6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . A18

6.1 Performance Evaluation using Synthetic Delay Traces A20

6.2 Performance Evaluation using Measured Delay Traces A22

7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . A25

Appendix I. Proof of Proposition 2 . . . . . . . . . . . . . . . . . A25

Appendix II. Proof of Proposition 3 . . . . . . . . . . . . . . . . A26

Appendix III. Proof of Theorem 1 . . . . . . . . . . . . . . . . . A26

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A27

B Autoregressive Model-Based Speech Packet-Loss Conceal-ment B1

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . B1

2 Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . B3

2.1 Basic Signal Model . . . . . . . . . . . . . . . . . . . B3

2.2 Extension to Long Term Correlations . . . . . . . . B4

3 MMSE Missing Segment Prediction . . . . . . . . . . . . . . B4

3.1 MMSE Prediction . . . . . . . . . . . . . . . . . . . B4

3.2 Effect of Model Estimation Error on Prediction . . . B5

3.3 Why perceptual weighting does not affect Prediction B7

4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B8

4.1 Experimental MSE Bound Verification . . . . . . . . B8

4.2 Subjective Quality Comparison . . . . . . . . . . . . B8

5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . B10

Appendix I. Proof of Proposition 1 . . . . . . . . . . . . . . . . . B10

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B15

viii

C High-Rate Analysis of Symmetric L-Channel Multiple De-scription Coding C11 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . C12 Analysis of Multiple Description Lower Bound . . . . . . . C4

2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . C52.2 Duality of the Inner-Tight and Outer-Tight expressions C62.3 Asymptotic Tight Lower Bound to the Rate-

Distortion Function . . . . . . . . . . . . . . . . . . C103 Evaluation of MDLVQ Systems . . . . . . . . . . . . . . . . C15

3.1 System Settings . . . . . . . . . . . . . . . . . . . . . C163.2 Index Assignment of a MDLVQ . . . . . . . . . . . . C183.3 The Geometry of L-tuples . . . . . . . . . . . . . . . C203.4 Asymptotic Analytic Performance . . . . . . . . . . C22

4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . C27Appendix I. Matrix Lemmas . . . . . . . . . . . . . . . . . . . . C28Appendix II. Proof of Lemma 2 . . . . . . . . . . . . . . . . . . . C29References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C29

D Bounding the Rate Region of Vector Gaussian MultipleDescriptions with Individual and Central Receivers D11 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . D12 Problem Formulation and Main Results . . . . . . . . . . . D3

2.1 Problem Formulation . . . . . . . . . . . . . . . . . D32.2 Gaussian Description Scheme . . . . . . . . . . . . . D42.3 Outer Bound to the Rate Region . . . . . . . . . . . D62.4 Optimal Weighted Sum Rate . . . . . . . . . . . . . D72.5 Rate Region for the Scalar Gaussian Source . . . . . D10

3 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . D114 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . D135 Rate region for Scalar Gaussian Source . . . . . . . . . . . . D176 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . D22Appendix I. Useful Matrix Lemmas . . . . . . . . . . . . . . . . . D22Appendix II. Proof of Lemma 4 . . . . . . . . . . . . . . . . . . . D23Appendix III. Proof of Lemma 2 . . . . . . . . . . . . . . . . . . D25Appendix IV. Proof of Lemma 5 . . . . . . . . . . . . . . . . . . D27Appendix V. Proof of Lemma 3 . . . . . . . . . . . . . . . . . . . D31References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D33

ix

Acronyms

AR Auto-Regressive

ARQ Automatic Repeat reQuest

CDF Cumulative Distribution Function

DCR Degradation Category Rating

FEC Forward Error Correction

FTP File Transfer Protocol

GSM Global System for Mobile Communications

HMM Hidden Markov Model

i.i.d. independent and identically-distributed

IEEE Institute of Electrical and Electronics Engineers

IP Internet Protocol

ISDN Integrated Services Digital Network

LP Linear Prediction

LPC Linear Prediction Coefficient

MDC Multiple Description Coding

MDSQ Multiple Description Scalar Quantization

MDLVQ Multiple Description Lattice Vector Quantization

ML Maximum Likelihood

MMSE Minimum Mean Square Error

MOS Mean Opinion Score

MSE Mean Square Error

xi

ODE Ordinary Differential Equation

PCM Pulse-Code Modulation

PDF Probability Density Function

PESQ Perceptual Evaluation of Speech Quality

PLC Packet Loss Concealment

PSTN Public Switched Telephone Network

QoS Quality of Service

RTP Real-time Transport Protocol

SCN Switched Communication Network

UDP User Datagram Protocol

VoIP Voice over Internet Protocol

VQ Vector Quantizer

WGN White Gaussian Noise

TCP Transmission Control Protocol

xii

Part I

Introduction

Introduction

1 Motivation

1.1 Background

With the expansion of broadband infrastructure, multimedia communica-tions over packet-switched networks such as the Internet have become in-creasingly popular. Multimedia communications can be either one-way fromone transmitter to one or more receivers (e.g., audio/video streaming), orinteractive multi-way (including two-way) between multiple communicat-ing parties (e.g., VoIP, video conferencing). Fig. 1 shows a demonstra-tion of multimedia communication over IP networks. The Switched Com-munication Network (SCN) can be a wired or wireless network, such asPSTN, ISDN or GSM. Some particular applications experience a phenome-nal growth of popularity, and are likely to replace traditional communicationsystems such as landline phones. One example is that the new generationof mobile phones (e.g., the iphone) is built with VoIP compatible systems.A driving impetus is the user-demand for easily accessible and inexpensiveservices with a satisfactory quality. However, the transmission bandwidthis limited while the traffic load and user-demand for communication qualityare increasing. This is a fundamental challenge in multimedia communica-tions.

IP Network

SCN

MultimediaServer

Figure 1: Multimedia communication environment.

2 Introduction

Conceptually, a multimedia communication link consists of an encodingsystem, a network and a decoding system, as shown in Fig. 2. The mediadata is first compressed and packetized by the encoding system, then deliv-ered over the network, and finally estimated by the decoding system basedon what was received from the network. There are two causes of degradationof the communication quality. The first is the coding distortion and the sec-ond is damaged or lost data due to network impairments. The coding distor-tion is determined by the coding system given the available bandwidth. Onthe other hand, the technological challenges stemming from the network im-pairments are a major barrier for providing a guaranteed performance withrespect to packet-loss, delay and bandwidth [95], [104], [131], [48], [135], [21].Thus, it is of great interest to understand the characteristics of packet-switched networks and thereafter design robust multimedia communicationsystems.

Encoder Network Decoderdata estimated

data

Figure 2: Basic block diagram of one-direction data communication.

The unreliability and heterogeneity of packet-switched networks resultin a best-effort service. The packets transmitted over packet-switched net-works suffer from network delay, network packet-loss, network delay varia-tion (also known as delay-jitter) and various connection speeds (or band-widths). We provide an overview of these impairments in more detail in thefollowing.

Network packet-loss occurs due to the congestion in network nodes(wired) and/or the corrupted packet checksum (wireless). The effect ofpacket-loss is a major issue for multimedia communications. When mediapackets are lost, the playout is interrupted, leading to a poor user expe-rience. The automatic repeat request (ARQ) technique is in general notsuitable for multimedia communications due to a low-latency service de-mand. Special treatment is required to mitigate the effect of packet-loss.Besides the packets being dropped or corrupted by the network, packet-loss could be a result of the compensation of delay-jitter, since the packetsreceived after the intended playout time are not useful.

Delay-jitter is caused by dynamic data traffic, or possibly the congestionof the network. Delay-jitter can be compensated at the cost of increasedcommunication-delay. The additional delay introduced by combating delay-jitter is irrelevant for most one-way transmissions but is crucial for interac-tive communications [87].

1 Motivation 3

This quality of service (QoS) limitation poses a challenge for multime-dia communications. Robust multimedia communications generally demandlow delay, high bandwidth and low packet-loss rates in order to provide ac-ceptable quality for the end users. The research goal of this thesis is todevelop methods that address network impairments, leading to robust mul-timedia communications.

1.2 Research Challenges

This thesis focuses on two network impairments: delay-jitter and packet-loss. These impairments are recognized to be central to the degradation ofthe communication quality. This subsection introduces and identifies theresearch questions that must be answered to resolve the problems resultingfrom the delay-jitter and the packet-loss.

Delay-Jitter Impairment

Packet delay-jitter refers to the variation in packet network delay. Due tothe variable nature of dynamic data traffic over a network, the statisticsof delay-jitter in general change over time. An example of this behavior isthe existence of delay spikes [92]. A delay spike represents the phenomenathat a set of consecutive packets arrives in a very short time. A segment ofnetwork delay trace is shown in Fig. 3.

0 5 10 15 20140

160

180

200

220

240

260

Packet generation time [s]

Pack

etnet

work

del

ay[m

s]

a delayspike

Figure 3: A segment of network delay trace (Each symbol · represents aparticular packet).

In general, delay-jitter has a large effect on the perceived quality of theinteractive communication and a relatively small effect on the perceived

4 Introduction

quality of the one-way transmission. We now discuss the effect of delay-jitter in detail. To compensate for delay-jitter, a typical multimedia com-munication system introduces a buffer (known as dejitter buffer) at thereceiver side. The dejitter buffer temporally stores the incoming packetsbefore playing them out, allowing slower packets to arrive in time. Thebasic idea behind the operation is to introduce additional buffering-delay toeven out the delay-jitter. The packet end-to-end playout delay is thus thesummation of the network delay and the buffering-delay. The packets thatmiss the scheduled playout deadline are considered lost. The total packet-loss consists of the packets being dropped or corrupted by the network andthe packets missing their delivery deadline. The length of the introducedbuffering-delay determines the packet-loss rate due to late arrival. As thebuffering-delay increases, more media packets arrive before their scheduledplayout time, resulting in a decreasing packet-loss rate.

Generally, for one-way multimedia transmissions, a long buffering-delaycan be introduced since the transmission involves little interactivity betweenthe communicating parties. Thus, the delay-jitter affects the quality littlein these applications. On the other hand, interactive communications aresensitive to delay. In this case, a short buffering-delay is desirable to en-hance the communication interactivity. Another special application whichalso requires low delay is live steaming (e.g., transmission of a realtime foot-ball match over a network). The end users generally expect the immediateplayout of media data after transmission [52]. Compared with interactivecommunications, live streaming is relatively flexible with regard to delay. Tobriefly summarize, in designing a dejitter buffer for interactive communica-tions and live streaming an inevitable trade-off must be made between cor-recting the effect of delay-jitter and maintaining a desirable low-delay. Theresearch goal is to design an efficient buffer control method that attemptsat achieving the best possible trade-off between delay and packet-loss.

Packet-Loss Impairment

Packet-loss causes information loss, which results in, for instance, non-fluency in audio or freezing frames in video. The philosophy for combatingpacket-loss in the scenario of multimedia communications is that end-usersare more willing to accept a degraded quality than a long service delay.This philosophy is different from that in the scenario of traditional datatransmissions (e.g., FTP, email), where no distortion or error is allowed.Packet-delay is generally not critical in traditional data transmissions, andpacket-retransmission can efficiently cope with packet-losses.

One strategy to combat packet-loss is to apply the techniques of packet-loss concealment at the receiver side. In general, the media content is tem-porarily (e.g., audio and video) and/or spatially (e.g., image and video)correlated. As a result, neighboring packets share redundant information.

1 Motivation 5

This facilitates the partial recovery of information in the lost packets by ex-ploiting the redundant information that is embedded in neighboring pack-ets. Packet-loss concealment methods are designed to be compatible withexisting coding systems. Specifically, a module of packet-loss concealment(PLC) is built and integrated with a decoding system, facilitating reason-able performance of the coding system in an environment with packet-loss.The research goal is to design an effective and compatible PLC method tooptimize the perceived quality of media presentation.

The other strategy is to design packet-loss-resilient coding systems. Thisstrategy requires the encoder to play a primary role while in the first strat-egy only the decoder is involved. To address the packet-loss, the encoderproduces a higher bit-rate than required for reconstructing the media data.In other words, the encoder introduces redundant information to the mediadata with the intention to mitigate the effect of packet-loss. This strat-egy is able to provide stronger robustness than the first strategy towardspacket-loss for multimedia communications on packet switched networks.

One natural scheme is to adopt the forward error correction (FEC) acrossneighboring packets (see [10], [96], [22] for audio transmission and [34], [69],[105] for video transmission), referred to as FEC scheme. The recovery oflost packets is performed at the cost of high-latency and high bandwidth.

A more advanced scheme is to generate a set of packets of a media dataand then send them separately through a network. By applying this scheme,the probability of losing all the information of the data is low, leading to arelatively reliable transmission. The set of packets can be delivered througheither one transmission link via time-shifting or different transmission linksin a parallel manner. Conceptually, this scheme exploits path-diversity inthe design of packet-loss-resilient coding systems. The transmission of eachpacket can be virtually viewed as that the packet is delivered through aseparate path (or channel).

A simple way to achieve robustness is to duplicate every media packetand then send them though a network. If at least one packet is received,the media quality is maintained. If no packet is successfully delivered, thequality degradation is the same as that of the one-packet transmission. Inthis situation, the robustness via diversity is achieved at the cost of doubledbandwidth. A natural question one may ask is if the media quality can beimproved upon receiving two packets through advanced coding. This is asource coding problem. Formally stated, the idea of compression efficiencyw.r.t. bit-rate and quality between a number of packets (or descriptions)of a source is referred to as the multiple-description (MD) problem. Mul-tiple description coding (MDC) allows a graceful quality degradation withdecreasing number of received descriptions, which is highly desirable formultimedia communications.

Recent studies showed that the MDC scheme performs better than theFEC scheme through some particular applications [22], [20], [56]. In this

6 Introduction

thesis we will focus on multiple description coding.

2 Approaches Addressing Delay-Jitter

The dejitter buffer plays an important role in multimedia communications.It reduces the number of playout interruptions due to delay-jitter. In partic-ular, the dejitter buffer reduces the sensitivity of the communication systemsto fluctuations in the packet network-delay by waiting for packets that havelong network delays. The protection of the playout continuity comes at thecost of prolonged latency which may negatively effect the performance ofthe applications with strict delay-constraints. The technical challenge sitsin the adaptation of the buffer length in response to varying network con-ditions so as to find the best compromise between playout interruption andthe communication delay for the given scenario, referred to as adaptive play-out scheduling. In the following, we provide a literature review of adaptiveplayout scheduling approaches for both audio and video communications,respectively.

2.1 Overview of Playout Scheduling for Audio Com-

munication

Adaptive playout scheduling for audio is the subject of a significant ongo-ing research effort because of the popularity of the IP telephony. Manymethods have been proposed in the last two decades. The basic principle ofthe reported methods is to estimate the statistics of recent packet-networkdelays and then determine the buffer-length or equivalently the packet play-out deadline for the next communication unit (e.g., a talkspurt or a blockof packets). The research objective is to minimize the packet-loss rate dueto late-arrivals while keeping a short buffering delay.

Depending on the particular statistics being estimated, the methodscan be classified into different groups. Some of the methods estimate themean and the variation in the network delay to calculate the playout dead-lines [92], [80]. Some methods estimate the cumulative distribution function(CDF) of the packet network-delay from a recent history in the calculationof the deadlines, which are either histogram-based [73], [78], [88], [100] orparametric-PDF based [9], [31], [103]. In particular, the Weibull distribu-tion [103], the Pareto distribution [31] and the Exponential distribution [9]are three widely exploited distributions in parametric-PDF modelling. How-ever, the above methods fail to model network delay statistics when packetdelay-spikes happen [103]. Special treatments are usually designed to dealwith the delay-spikes to make the methods reliable. Some methods ex-ploit stochastic processes to model the packet-arrivals. Specifically, in [70]the packet-arrivals are modelled as a combination of several Poisson pro-

2 Approaches Addressing Delay-Jitter 7

cesses. However, the modelling accuracy becomes low when the delay-jitterincreases.

Given a statistical model of the network packet-delay, different ap-proaches for playout adjustment have been proposed in the literature. A firstapproach imposes the buffer-size adjustment without regarding the mediasignal (e.g., the talk spurt character of human speech) [75]. This approachmay result in unnaturalness of the output audio due to, for example, thedrop of consecutive voiced packets. A second approach adjusts the playoutdeadline at the beginning every talk spurt ( [92], [103], [72]). The approachartificially elongates or shortens the silence duration before active speech.In general, this approach does not affect the listening quality, but it haslimited effectiveness when talkspurts are long. A third approach exploitsthe pitch correlations of active speech. It stretches or compresses the ac-tive speech segment on a per-packet basis (e.g., [76], [73]). The differencebetween the processed speech segment duration and the original segmentduration is constrained to be an integer multiple of the pitch period. Thisapproach in general introduces degradation of the listening quality, but itgains flexibility in playout adjustment.

Informally, every adaptive playout scheduling method can produce atrade-off curve w.r.t. the total packet-loss rate and the packet playout-delay.From a QoS perspective, the choice of the optimal point on the trade-offcurve should be determined by the perceived communication quality. Thisposes another question as how to estimate the communication quality.

Recently many studies have been conducted on measuring the overallcommunication quality between two parties [103], [79], [49] [72], [3]. In par-ticular, a so-called E-model of the overall communication quality was pro-posed by the ITU-T union [49]. The model (in terms of a quality rating fac-tor R) captures various impairments in estimating the speech-transmissionquality in a form of linear combination of individual impairments. The Rfactor ranges from 100 down to 0, where 100 is excellent and 0 is poor, ofwhich the expression is given as

R = R0 − Is − Id − Ie + A. (1)

In the above expression, R0 represents the basic signal-to-noise ratio. Is rep-resents a combination of impairments that occur simultaneously with thevoice signal, which could be the error in the loudness level, a non-optimumsidetone, and distortion resulting from quantization. Id represents the im-pairments caused by delay. Ie, the effective equipment factor, representsthe impairments caused by low bit-rate codecs and packet-loss. Finally, theadvantage factor A compensates for impairment factors when access advan-tages are available to the end user. The R factor can be converted to aMOS score of conversational quality [103].

To apply the E-model in voice playout scheduling, one only needs tofocus on the two factors Id and Ie. The other factors are unaffected by

8 Introduction

the dejitter buffer. It is worth noting that the two factors Id and Ie arenonlinear functions of delay and packet-loss, respectively (e.g., [103]). Inprinciple, if a method can provide a trade-off curve lower than another oneproduced by another method, then the method should outperform the othermethod also in terms of the overall quality criteria.

2.2 Overview of Playout Scheduling for Video Commu-

nication

Playout scheduling for video communication is required not only for inter-active communication but also for live streaming. For the application of livestreaming, the end users generally expect the immediate playout of mediadata after transmission [52]. Thus, similar to interactive communication,live streaming also requires low playout latency. As no interactivity is es-sentially present in live streaming, the playout delay constraint is not asstrict as that of interactive communication.

A number of playout scheduling methods have been developed to com-pensate for jitter in video communications. Some methods adjust theplayout time by either slowing down or speeding up the playout speed[53, 54, 67, 68, 101]. The control of playout speed is realized by scaling theduration that each video frame is shown. Recently a new method has beenproposed, which performs content-aware playout adjustment [71]. The ideais to vary the playout speed of frames, based on the frame content. Forinstance, frames with low or no motion might be less affected by playoutadjustment than frames with high motion.

In the application of video communication, the audio/video synchroniza-tion is another important technique for providing high perceived quality byend users. If the synchronization is not properly addressed, the playout ofthe media data can be annoying, leading to a poor end-user experience. Sev-eral methods on synchronizing audio and video content have been proposedin the literature [25, 33, 63, 74, 132].

3 Approaches Addressing Packet-Loss

Packet-loss is another major issue in the design of robust multimedia com-munications. In the past years considerable attention has been devotedto mitigate the effect of packet-loss. Some approaches focus on recoveringmissing packets only at the receiver side. They can be viewed as passiveapproaches since only the receiver is involved. We refer to them as packet-loss concealment (PLC) approaches. Other approaches attempt to designpacket-loss-resilient systems that involve both the transmitter and the re-ceiver. Accordingly, they can be viewed as active approaches as comparedto the receiver-oriented approaches.

3 Approaches Addressing Packet-Loss 9

As is mentioned in subsection 1.2, there are two popular schemes in thedesign of packet-loss-resilient systems: one is the FEC scheme and the otheris the MDC scheme. The MDC scheme exploits path-diversity to achieverobust transmission. Our attention will be restricted to the MDC schemeas our work on the design of packet-loss-resilient systems in the thesis fallsinside this category.

3.1 Packet-Loss Concealment

PLC for Audio Communication

With the widespread expansion of the Internet, voice over IP has become in-creasingly popular. To combat the unreliable delivery of voice packets overthe internet, various packet-loss concealment (PLC) approaches have beenproposed. PLC approaches are designed to work with existing decodersand with the corresponding encoders untouched for combating packet-loss.With properly designed PLC approaches, some traditional audio codecs canbe directly utilized in an unreliable communication environment. The ba-sic principle behind PLC is to take advantage of the neighboring receivedpackets in the reconstruction of the missing signal segment by exploiting theredundant information. Many of the approaches on PLC are formulated ina heuristic manner. However, the procedures can roughly be divided intomethods that can be motivated with a maximum likelihood (ML) based cri-terion and methods that can be motivated with the minimum mean squarederror-based (MMSE) criterion.

A large number of PLC procedures replace the missing segment witha signal that is similar to that of the neighboring received signal segment.These approaches can be interpreted as ML methods. The typical set the-orem from information theory states that asymptotically with increasinglength any sequence generated by the model is equally likely [16]. In the lit-erature, some approaches intend to reconstruct missing packets in the signaldomain. The work in [35], [127] simply repeats the signal from the packetsprior to the lost one to cover the gap due to packet-loss. A more advancedPLC approach using time-scale modification is described in [97,102]. Com-pared with the waveform substitution approach, the second approach keepsthe periodicity of the audio signal, resulting in an improved perceived qual-ity. Other approaches intend to reconstruct missing packets in the transformdomain. Specifically, the audio or speech signal is assumed to be generatedby feeding an excitation signal into a statistical model. When a packet islost, an excitation signal is constructed and then fed to the model in re-construction. A commonly used audio or speech model is the autoregressive(AR) model, which is estimated using linear prediction (LP). Gunduzhanand Momtahan proposed to construct the excitation signal for the autore-gressive model of the missing segment by repeating the excitation signal of

10 Introduction

the previous received speech with a periodicity that equals the pitch pe-riod [43]. In [120], Wong et al. propose to classify the missing segmentinto voiced, unvoiced or partially voiced types and construct the excitationsignal correspondingly.

The literature on usage of the MMSE criterion in the PLC context isrelatively limited. In [94], Rødbro et al. employ a hidden Markov model(HMM) to track the evolution of speech features such as the pitch fre-quency. Benefiting from the sophisticated HMM model, the feature vectorof a missing packet is estimated using the MMSE criterion and the harmonicsinusoidal parameters for synthesizing the speech are then constructed fromthe feature vector. In [59], Kondo and Nakagawa propose the use of ahigh-order AR model to capture both the short-term and long-term speechcorrelations. The missing speech samples are then predicted recursively byrunning the model with zero input.

PLC for Video Communication

Video communication typically requires higher bandwidth than audio com-munication. The coding efficiency has received considerable attention inthe past. The video compression is usually a combination of temporal mo-tion compensation to reduce temporal redundancy and spatial image com-pression to reduce spatial redundancy. The spatial image compression isin general achieved by transform coding on a block basis. The resultingparameters are then entropy-coded to produce the compressed bitstreamwhich is further packetized. In the coding process, some frames are codedindependently (referred to as intraframe compression) and other frames arecoded by using earlier and/or later frames in a sequence (referred to asinterframe compression). If the intraframe compression is dominant, thecommunication is much reliable as packet-loss would not result in seriouserror propagation.

A survey on approaches to recover the missing areas based on adjacentinformation is provided in [125]. Depending on the domain where the sig-nal recovery is performed, spatial-domain scheme, temporal-domain schemecan distinguished. A spatial-domain scheme recovers impaired macroblocksbased on neighboring ones (see [1, 45, 61, 126]). A temporal-domain schemeestimates the missing information by making use of temporal correlationsembedded in adjacent frames. Examples include motion-compensated in-terpolation [2] [15], and motion vector recovery [44], [81], [62].

3.2 Multiple Description Coding

Multiple description (MD) coding deals with lossy coding of an informationsource for transmission with L unreliable channels connecting the sourceand the destination. Each channel is assumed to provide an independent


means of transmission, facilitating the design of MD systems. In practice,the L channels could be created by using one transmission link via time-shifting. To make use of all the available channels, an encoder generatesL descriptions of the source and then sends them over the L channels, re-spectively. Each channel either delivers the description successfully to thereceiver or loses it during transmission, resulting in 2L possible outcomes ofreceived descriptions. The receiver reconstructs the best possible approxi-mation of the source based on the received descriptions. The status of theL channels are known to the receiver but not to the sender. In the designof a MD system, the problem is to minimize the 2L − 1 distortions due toreconstruction using information from any subset of the L descriptions forgiven channel rates.

For the special case that L = 1, i.e., only one channel is availablefor communication, the MD problem reduces to a single-description lossysource-coding problem. In this case, the design of a coding system concernswith only one distortion and one transmission rate. This single-descriptionproblem has received a lot of attention since the pioneering work by Shan-non in his two landmark papers [98], [99]. For an overview of the liter-ature to the problem, one can refer to the papers [7], [55], [42] and thebooks [6], [17], [16]. Single-description coding provides a basis for under-standing the performance limits of multi-description cases.

The progress on the general MD problem in the literature is partly re-lated to the theoretical MD performance limits (e.g., [85], [32]) and partlyrelated to the design of efficient practical MD schemes that provide nearoptimal performance (e.g., [13, 30, 37, 38]). Practical MD schemes are ingeneral content-independent. Specifically, the MD schemes can carry differ-ent types of media data for transmission, e.g., speech, image, video and soon.

Roughly speaking, two groups of practical MD schemes have beenproposed in the literature: quantization-based and transform-based.Quantization-based schemes can be further classified into scalar MD quanti-zation [5,8,106,108,114,115], trellis coded MD quantization [51,121,130,138]and vector MD quantization [11, 13, 18, 19, 28, 29, 84, 107, 110]. Transformbased schemes include correlation transforms [36, 38, 39, 82, 122–124], over-complete expansions and filterbanks [4, 14, 23, 40, 41, 60].

Our work on MDC in this thesis is devoted to both the MD theoreti-cal performance limits and practical MD schemes. For the practical MDschemes, we consider the vector MD quantization. In the literature re-view below, we will restrict our attention to those MD quantization-basedschemes instead of transform-based schemes.

In the following, we first present a background for the MD problem.Then we provide an overview of the information theoretical results for single-description, two-description and multi-description scenarios, respectively.Finally we provide an overview of MD quantization-based schemes.

12 Introduction

Background

Suppose Xn = {X [i]}, i = 1, . . . , n, is a sequence of source data thatoriginates for instance from sampling a continuous signal. Let the Lencoding function be fn

l (Xn), l = 1, . . . , L, in the generation of L de-scriptions, e.g., the l’th description is fn

l (Xn). In principle, there are2L − 1 (the case that no description is received is trivial and often ig-nored) reconstruction functions gS(·), each corresponds to a particular sub-set S ⊆ {1, . . . , L} of the L descriptions. We denote the reconstructionoutput as Xn

S = gS ({fnl (Xn), l ∈ S}) for a subset S ⊆ {1, . . . , L}. Each

reconstruction output introduces a quality degradation, or equivalently, dis-tortion. We refer to the distortion resulting from Xn

{1,...,L} as the central dis-tortion and to distortions resulting from other reconstruction outputs as sidedistortions. In particular, the distortions resulting from Xn

{l}, l = 1, . . . , L

are referred as individual side distortions. Without ambiguity, Xn{l} can

also be denoted as Xnl . Fig. 4 shows the transmission architecture of an

L-description system.

Encoder Decoder

Description 1

Description 2

Description L

..

....

R1

R2

RL

ErasureChannelXn

Xn

Figure 4: General L-description MD system. Description l is encodedwith transmission rate Rl.

In the design of a MD system, a fidelity criterion ρn(Xn, Xn) has to bedetermined to measure the quality degradation of a reconstruction outputXn with regard to Xn. If ρn(Xn, Xn) takes the form

ρn(Xn, Xn) =1

n

n∑

i=1

ρ(X [i], X[i]), (2)

ρn(·, ·) is referred to as a single-letter criterion. Distortion criteria of theform ρ(X [i]−X[i]) are called difference distortion criteria. Once ρn(Xn, Xn)is specified, the average distortion is given by

d = E

[

ρn(Xn, Xn)]

, (3)


where E[·] represents the statistical expectation operation. We use sub-scripts to distinguish between the distortions resulting from different de-coding functions, i.e., dS is due to the reconstruction Xn

S , S ⊆ {1, . . . , L}.We denote the central distortion by dc for simplicity.

In principle, the distortion criterion should capture the perceived qualitydegradation by the end users. See [26] for a number of distortion criteria ap-plied to gray scale image coding in the evaluation of their relations with theperceived quality degradation. In practice the mean squared error (MSE)criterion is widely applied in the design of source coding systems. Thedominance of MSE criterion stems from the fact that system performancew.r.t. the criterion is often traceable and closed form expressions can bederived. The MSE criterion is a difference distortion criterion. The mea-surement ρ(X [i] − X[i]) takes the form: ρ(X [i] − X[i]) = (X [i] − X[i])2.Correspondingly, the MSE due to the reconstruction Xn is

d =1

n

n∑

i=1

E

[

(X [i] − X[i])2]

. (4)

In the multiple description scenario, the mean squared error with regard toXn

S takes the form:

dS =1

n

n∑

i=1

E

[

(X [i] − XS [i])2]

, (5)

where S ⊆ {1, . . . , L}.The derivation of the MD rate-distortion region deals with the deter-

mination of a set of simultaneously achievable rates satisfying a prescribedfidelity criterion (e.g., a set of distortion constraints). Next we introduce afew definitions that help in describing the known MD information theoret-ical results.

Definition 1 (e.g., [83]) An inner bound to the MD problem is a set ofachievable rate-distortion points for a specific source and fidelity criterion.

Definition 2 (e.g., [83]) An outer bound to the MD problem is a set ofrate-distortion points, for a specific source and fidelity criterion, for whichit is known that no points outside this bound can be reached.

For a given source and fidelity criterion, the MD rate-distortion region isnot always tractable and closed form expressions may even be unavailable.In this situation, it is of great interest to find reasonable inner and/or outerbounds. An inner and an outer bound provide information about the posi-tion of the MD rate-distortion region. If the inner bound and outer boundcoincide they are called tight. In this case, the MD rate-distortion region isfully specified by the inner or outer bound.

14 Introduction

Single-Description Theoretical Results

Single-description coding can be viewed as a special case of the MD prob-lem by letting L = 1. Since the single-description coding only involves onetransmission rate and one distortion, the determination of the correspond-ing rate-distortion region is equivalent to specifying the optimal trade-offbetween the transmission rate and the distortion, which is referred to asthe rate-distortion function. The rate-distortion function specifies a lowestrate (expressed in bits per sample) for a given distortion constraint. Thedistortion criterion is usually limited to be a single-letter criterion.

Formally stated, the rate-distortion function R(d) for a stationary source{X [i]}∞i=1 with memory is defined as (e.g., [6])

R(d) = limn→∞

Rn(d), (6)

where the order-n rate-distortion function Rn(d) takes the form

Rn(d) = inf

{

1

nI(Xn; Xn) : Eρn(Xn, Xn) ≤ d

}

. (7)

The term I(Xn; Xn) denotes the mutual information between the two vec-tors (e.g., [6]). The infimum in (7) is over all conditional distributionsfXn|Xn(xn|xn) satisfying the distortion constraint. The function R(d) spec-

ifies the minimum achievable rate (on a per sample basis) for transmittingan infinite sequence {Xn, n → ∞} with a distortion constraint d.

In general, it is difficult to obtain an analytic expression for the rate-distortion function for an arbitrary source. However, for a memorylessGaussian source with variance σ2

X and MSE criterion, the rate-distortionfunction takes a simple form [6]

R(d) =1

2log

(

σ2x

d

)

, (8)

where d ≤ σ2x. Fig. 5 shows the R(d) function for a unit-variance Gaussian

source.With the R(d) expression in (8) as a basis, the rate-distortion function

of an arbitrary memoryless scalar source and MSE criterion can be upperand lower bounded. Before presenting the two bounds for encoding a gen-eral source, we first introduce two measurements: entropy and differentialentropy. Suppose Y is a discrete random variable with a probability massfunction PY (y). The entropy is defined as (e.g., [57])

H(Y ) = −E [log(PY (y))]

= −∑

y∈YPY (y) log(PY (y)). (9)


0 1 2 3 4 5−30

−25

−20

−15

−10

−5

0

R [bit/dimensionality]

d[d

B]

Figure 5: Rate-distortion function for a unit-variance Gaussian sourceand the mean squared-error distortion criterion.

The entropy acts as a lower bound on the transmission rate in lossless codingof a discrete source [47, 64–66, 93, 129]. However, for a continuous randomvariable, the entropy is infinite, which is not useful. Instead, the differentialentropy is introduced for a continuous random variable. Compared withthe entropy, the differential entropy is abstract and has no physical mean-ing. Suppose X is a continuous random variable with a probability densityfunction fX(x). The differential entropy is defined as [6]

h(X) = −E [log(fX(x))] = −

∫

fX(x) log (fX(x)) dx. (10)

For a Gaussian random variable with variance σ2X , its differential entropy

is 12 log

(

2πeσ2X

)

.Upon introducing the differential entropy, we present a well-known lower

and upper bound of the rate-distortion function for a general memorylessscalar source and MSE criterion. Suppose a memoryless source X has vari-ance σ2

X and finite differential entropy h(X). The rate-distortion functionR(d) satisfies the following inequality [6]:

1

2log

(

σ2X

d

)

≥ R(d) ≥1

2log

(

PX

d

)

, (11)

where PX = (2πe)−1e2h(X) is the entropy power which is the variance of aGaussian density that has the same differential entropy as X . The lowerbound of R(d) in (11) is often referred to as the Shannon lower bound. Theinequality (11) shows that, of all sources, the Gaussian source is the mostdifficult to compress.

16 Introduction

The rate-distortion function can also be extended to stationary vectorsources. Similar to the MSE criterion for scalar sources, the covariance dis-tortion measure is often exploited for vector sources in various source codingproblems (e.g., [77, 86, 91, 118, 119]). Suppose the information source is amemoryless vector source Xn = {X[i]}, i = 1, . . . , n, with a N×N marginal

covariance matrix Kx. The covariance distortion of a reconstruction Xn

with regard to Xn is defined as

ρn(Xn, Xn) =1

n

n∑

i=1

E

[

(X[i] − X[i])(X [i] − X[i])t]

. (12)

In (12), the quantity ρn(Xn, Xn) is an N × N semi-definite matrix. Therate-distortion function for a memoryless vector Gaussian source and co-variance distortion measure is known. Suppose the covariance distortion isD that satisfies 0 ≺ D � Kx. The rate-distortion function for the vectorGaussian source takes the form (e.g., [118]):

R(D) =1

2log

(

|Kx|

|D|

)

. (13)

If instead the MSE criterion is used for encoding a memoryless vector Gaus-

sian source, i.e., tr[

ρn(Xn, Xn)]

is considered, the well-known reverse wa-

terfilling technique comes in, which we will not present here.

Two-Description Theoretical Results

Two-description coding consists of two encoding functions and three de-coding functions. Fig. 6 shows the two-description transmission diagram.The total sum rate RT is split between the two descriptions: that isRT = R1 + R2. When only one channel works, either Decoder 1 or De-coder 2 is triggered, which produces Xn

1 or Xn2 . The output results in a

side distortion d1 or d2. When both channels work, Decoder c is triggered,which produces an output Xn

{1,2}. Correspondingly, the output results acentral distortion dc.

The MD performance limit for the two-description case has been ex-plored extensively over three decades (e.g. [32,85,133]). Formally, the two-description performance limit concerns the determination of smallest possi-ble transmission rates (R1, R2) with the distortions satisfying the distortionconstraints (d1, d2, dc) respectively for encoding an infinite source sequence.

The pioneering work on the multiple description problem is the two-description achievable rate region by El Gamal and Cover [32] (known asthe EGC region). Ozarow showed that the EGC region is tight for a mem-oryless scalar Gaussian source under the MSE criterion [85]. On the otherhand, Zhang and Berger found that the EGC region was not tight in gen-eral when there is excess rate (redundancy) among the descriptions [136].


replacemen

Encoder

Decoder 1

Decoder c

Decoder 2

R1

R2

Xn

Xn1

Xn{1,2}

Xn2

Figure 6: The two-description MD system.

Later, Wang and Viswanath showed that the EGC region is also tight for amemoryless vector Gaussian source and covariance distortion measure [118].Vaishampayan and Batllo in [112] further investigated the EGC region fora memoryless scalar Gaussian source under the MSE criterion. The authorsconsidered a symmetric description scenario that is all the transmission rates(i.e., R1 = R2) are equal and the distortion depends only upon the numberof received descriptions (i.e., d1 = d2). They provide a useful high-rateresult that the product of the central distortion and the side distortion isapproximately a function of the rate R1. The advantage of this propertyis that it gives an asymptotically accurate approximation of the theoreticallower bound with high transmission rate.

In general, it is difficult to obtain the MD rate-distortion region forarbitrary sources and distortion criteria. Instead, the research focus hasbeen on deriving the rate-distortion inner or outer bounds. An MD innerbound and an MD outer bound for general memoryless sources with finitedifferential entropy and MSE criterion have been derived by Zamir [133,134].The derivation of the bounds was inspired by the single-description bound(11). Later an inner bound for arbitrary memoryless sources and MSEcriterion larger than that derived in [133] was obtained by Feng and Effros[27]. Outer bounds for the binary symmetric source and Hamming distortionhave been obtained by Wolf et al. [50], Witsenhausen [128] and Zhang andBerger [137].

Next we briefly review some of the two-description theoretical results. Aswas mentioned above, the two-description rate-distortion region is known fora memoryless Gaussian source and MSE criterion [85, 118]. For a memory-less scalar Gaussian source with marginal variance σ2

X , the region consistsof the convex hull (which can be achieved by time-sharing) of the set of

18 Introduction

R1

R2

R(d1)

R(d2)

B1

B2

Rate Region

Figure 7: Rate region for two-description problem with the distortiontriplet (d1, d2, dc). R(di), i = 1, 2, is the single-descriptionrate-distortion function. The line segment B1-B2 is deter-mined by d1, d2, and dc.

achievable tuples (R1, R2, d1, d2, dc) of which the rates satisfy [13, 85]

R1 ≥ R(d1) =1

2log(

σ2X

d1) (14)

R2 ≥ R(d2) =1

2log(

σ2X

d2) (15)

R1 + R2 ≥ R(dc) +1

2δ(d1, d2, dc)

=1

2log(

σ2X

dc) +

1

2δ(d1, d2, dc), (16)

where δ(·) is given by

δ(d1, d2, dc) =

1, dc < d1 + d2 − σ2X

σ2X dc

d1d2, dc >

(

1d1

+ 1d2

− 1σ2

X

)−1

(σ2X−dc)2

(σ2X

−dc)2−(√

(σ2X

−D1)(σ2X

−d2)−√

(d1−dc)(d2−dc))2 , otherwise

.

(17)

(14)-(17) describes all the achievable rates given the distortion constraints.For simplicity, we denote the MD rate distortion region (14)-(17) by


R∗(σ2X , dc, d1, d2). Fig. 7 shows the achievable rate region given a distor-

tion triplet (d1, d2, dc), where dc is not loose (i.e., dc ≤ 1/( 1d1

+ 1d2

− 1σ2

X

)).

Equivalently, the rate-distortion region (14)-(17) can also be represented interms of the achievable distortions given the rates:

d1 ≥ σ2Xe−2R1 (18)

d2 ≥ σ2Xe−2R2 (19)

dc ≥

{

σX2e−2(R1+R2) 1

1−(√

Π−√

∆)2if Π ≥ ∆

σX2e−2(R1+R2) otherwise

, (20)

where

Π = (1 − d1/σX2)(1 − d2/σ2

X)

∆ = d1d2/σ4X − e−2(R1+R2).

10-5

10-4

10-3

10-2

10-1

100

10-10

10-9

10-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

dc

d1

exactapprox.

R = 2

R = 4

R = 6

R = 8

Figure 8: Graphic depiction of the relationship of dc and d1 for differentrates. The per-channel rate R is measured in bits.

Vaishampayan et al. studied the expressions (14)-(17) for a symmetricdescriptions scenario (i.e., d1 = d2 and R1 = R2.) at high resolution [112,113]. They showed that if the side distortions satisfy

d1 = σ2Xbe−2R1(1−a), (21)

for 0 < a < 1 and b ≥ 1, the distortion product dcd1 is lower-bounded by asimple bound:

dcd1 ≥σ4

X

4e−4R1 . (22)

20 Introduction

The lower bound in (22) serves as a simple means of relating the performanceof practical MD schemes to the real theoretical bound (14)-(17) (or equiv-alently (18)-(20)). The asymptotic behavior of the distortion product hasbeen utilized widely as a tool to assess the efficiency of practical two-channelMD systems [106, 107, 110]. It was shown that an optimal two-descriptionscheme behaves similarly to the lower bound approximation (22) at highresolution and when dc ≪ d1. Fig. 8 shows the real rate-distortion boundand the approximation (22) for different transmission rates. It is seen theapproximation is quite close to the real lower bound when the transmissionrate increases. A more general but less used lower bound approximationwas also obtained by Vashampayan [113]:

dcd1 =σ4

X

4

1

1 − dc/d1e−4R1 , (23)

which meets the approximation (22) if dc/d1 → 0. Although the approxi-mation (22) is less accurate than (23), it has a simple form, which leads toa wide usage of the expression (22).

Based on (11) and the derivation procedure in [85], Zamir derived aninner and an outer bound of the MD rate distortion region for a generalmemoryless source and MSE criterion. Suppose a memoryless source hasa variance σ2

X and finite differential entropy h(X). Its MD rate distortionregion RX(dc, d1, dc) is upper and lower bounded by [133]

R∗X(σ2

X , dc, d1, d2) ⊆ RX(dc, d1, dc) ⊆ R∗X(PX , dc, d1, d2), (24)

where PX takes the form as that in (11). R∗X(A, dc, d1, d2) is the rate

distortion region of a memoryless Gaussian source with variance A. Thetwo bounds (24) parallel Shannon’s lower and upper bounds (11) for thesingle description coding.

In [118], Wang and Viswanath studied the performance limit of an L-description vector Gaussian scenario. The covariance distortion measurewas considered. For the special two-description case, the authors obtainedthe rate distortion region. Suppose the data source is an i.i.d. vector Gaus-sian source with a marginal covariance matrix Kx. The covariance distor-tion constraints are (Dc, D1, D2), where 0 ≺ Dc ≺ Dl ≺ Kx, l = 1, 2. Therate distortion region is given as [118]

R∗ (Kx, Dc, D1, D2)

=

(R1, R2) :

Rl ≥12 log |Kx|

|Dl| , l = 1, 2

R1 + R2 ≥ supKz≻0

12 log |Kx||Kx+Kz||Dc+Kz|

|Dc||D1+Kz||D2+Kz|

. (25)

When the source is specialized to a scalar Gaussian source, (25) coincideswith (14)-(17).


Generally speaking, opening problems for two-description coding wouldbe to consider performance limits for general sources and distortion criteria.One can focus on deriving better inner or outer MD rate distortion bounds.

L-Description Theoretical Results

L-description coding consists of L encoding functions and 2L − 1 decodingfunctions. This suggests that in the derivation of theoretical results one hasto consider the trade-off between L transmission rates and 2L−1 distortions.Due to the problem complexity, the full characterization of the MD rate-distortion region for a source and a general distortion criterion is still anopen problem.

Significant effort has been spent on deriving inner and/or outer rate dis-tortion bounds for L-descriptions coding. Venkaramani et al. first obtainedan achievable L-description rate-distortion region [116, 117] for arbitrarymemoryless sources and a single-letter distortion criterion. In general theregion is described by a complicated expression but for the quadratic Gaus-sian case the region ends up with a simple form. Later, the work in [89], [90],provided an enlarged achievable rate region by using the random binningideas from distributed source coding. The recent work by Tian [109] pro-vided both an inner and an outer rate distortion bounds for a scalar Gaus-sian source under a symmetric MSE distortion constraint.

Other research has been devoted to characterizing the optimal sum rateor deriving the rate-distortion region for some special multiple descriptionscenarios. Wang and Viswanath obtained the optimal sum rate when onlya subset of distortion constraints are of concern [118, 119]. In particularin [119], the authors studied the symmetric descriptions scenario with twolevels of receivers. Each of the first-level receivers obtains κ of the L de-scriptions, (κ < L). The second-level receiver obtains all the L descriptions.For the considered MD scenario, the optimal sum rate has been derived fora memoryless vector Gaussian source and the quadratic distortion criterion.The work by Chen in [12] characterized the rate-distortion region when onlythe central distortion and the L individual side distortion constraints areimposed. However, the work is limited to a memoryless scalar Gaussiansource and MSE criterion.

As Paper D in the thesis also studies the MD problem addressed in [12]but for a vector Gaussian source, we briefly present the result of [12] in thefollowing for reference. Suppose a scalar Gaussian source has variance σ2

X .The central and individual side distortion constraints are d{1,...,L} (or dc)and dl, l = 1, . . . , L, respectively. The rate distortion region for this MDscenario is described by characterizing every optimal weighted sum ratein [12]. Let R∗(d1, . . . , dL, d{1,...,L}) denote the convex closure of the set ofall achievable rate vectors satisfying the distortion constraints. The optimalweighted sum rate for a weighting vector (α1, . . . , αL), α1 ≥ . . . ≥ αL > 0,

22 Introduction

is given as

Rwei = min(R1,...,RL)∈R∗(d1,...,dL,d{1,...,L})

L∑

i=1

αiRi. (26)

In the calculation of Rwei in (26), the author first derived the optimalweighted sum rate with respect to individual distortion constraints dl,l = 1, . . . , L, auxiliary distortion constraints d′{1,...,i}, i = 2, . . . , L − 1, and

central distortion constraint d{1,...,L}. The new optimal weighted sum rateη(d1, . . . , dL, d′{1,2}, . . . , d

′{1,...,L−1}, d{1,...,L}) takes the form

η(d1, . . . , dL, d′{1,2}, . . . , d

′{1,...,L−1}, d{1,...,L})

= mind{1,...,i}∈[0,d′

{1,...,i}]

i=2,...,L−1

maxσ2

j∈[0,σ2

X]

j=1,...,L−1

ψ(d1, . . . , dL, d{1,2}, . . . , d{1,...,L}, σ21 , . . . , σ

2L−1),(27)

where

ψ(d1, . . . , dL, d{1,2}, . . . , d{1,...,L}, σ21 , . . . , σ

2L−1)

=L−1∑

i=1

(

αi+1

2log

(

σ4X(σ2

Xd{1,...,i+1} − d{1,...,i+1}σ2i + σ2

Xσ2i )

((σ2X − σ2

i )d{i,...,i} + σ2Xσ

2i )((σ2

X − σ2i )di+1 + σ2

Xσ2i )

)

+αi − αi+1

2log

(

σ2X

d{1,...,i}

)

)

+αL

2log

(

σ2X

d{1,...,L}

)

.

The optimal weighted sum rate Rwei is obtained by varying the auxiliarydistortion constraints d′{1,...,i}, i = 2, . . . , L − 1, in (27) and searching for

the minimum value. For fixed auxiliary distortion constraints, (27) involvesa min-max optimization.

Multiple Description Quantization

After introducing the MD theoretical results above, we now focus on prac-tical MD schemes that are discussed in the literature. In particular, werestrict our attention to the multiple description quantization schemes.

Since the pioneering work by Vaishampayan [111] where a practicalmultiple-description scalar quantization (MDSQ) system was first proposed,many researches have focused on the design of efficient quantization-basedMD schemes. The design of MDSQ is essentially converted to constructgood index assignment matrices for two description case (see [106], [110]and [111]) or index assignment arrangements for multi-description case(see [8]). Fig. 9 shows the linear index assignment proposed in [111] asan example. The two-description MDSQ design problem is particularly wellunderstood [58], [112]. Suppose a symmetric multiple-description encodersends information over each channel at a rate of R bits per sample. Theperformance of the system is measured by a three-tuple (R, d1, dc) where d1


is the side distortion and dc is the central distortion. Suppose the sourceto be transmitted has a finite differential entropy h(X). The work of [112]showed that for an entropy-constrained multiple-description quantizer, thedistortions satisfy

d1dc = limR→∞

1

4

(

e2h(X)

12

)2

e−4R, (28)

which has the same structure as (22). To make a fair comparison with (22),we consider a scalar Gaussian source with variance σ2

X . In this case, wehave

d1dc = limR→∞

σ4X

4

(

2πe

12

)2

e−4R. (29)

The gap between (29) and (22) is a constant, which is (2πe/12)2. Theperformance gap is mainly due to the low dimensionality of quantizationspace.

1 3 6

2 5 8 11

4 7 10 12 14

9 13 15 17 19

16 18 20 22 25

21 24 27 30

23 26 29 32 33

28 31 34 36 38

35 37 39 41

40 42 43

Figure 9: Linear index assignment with five diagonals.

It is well known that vector quantization has a space filling advantageover scalar quantization [57]. This is because one has the freedom to con-struct cell shapes that are more ”spherical” than a hypercube in higher di-mensional space. Specifically, for the scenario of single-description entropy-coded at R bits per sample, when using an n−dimensional lattice Λ as acodebook the distortion dV (R) related with the distortion of dS(R) of scalarquantizer by

limR→∞

dV (R)

dS(R)=

G(Λ)

1/12,

24 Introduction

where G(Λ) is the normalized second moment of a Voronoi cell of the latticeΛ. It has been shown that good lattices exist that satisfy G(Λ) → 1

2πe asn → ∞ [24]. Lattices are commonly used in the design of MD quantizationsystems to gain quantization efficiency over MDSQ [110]. The MD quan-tizers that exploits lattice structure are referred to as multiple-descriptionlattice vector quantization (MDLVQ) systems. As in MDSQ, the maindesign task in MDLVQ is to construct a good index assignment. The per-formance of a MDLVQ system for the two-description case was analyzedin [110], culminating in the relation

d1dc = limR→∞

1

4G(Λ)G(Sn)e4h(X)e−4R, (30)

where G(Sn) is the normalized second moment of a sphere in n dimensions.As compared to (28), the distortion product exhibits a reduction due tothe use of lattice structure. As n goes to infinity, (30) approaches the lowerbound approximation (22) for a Gaussian source by exploiting good lattices.

The design of practical L-channel MDLVQ systems was addressed in[46, 84]. The index assignment was recognized to play a critical role in thesystem design. However no simple form of the distortion product like theexpression (30) for the two-description case has yet been developed.

4 Summary of Contributions

The focus of this thesis is on robust multimedia communications over packetswitched networks. The main contributions of the thesis can be summarizedas follows:

• it proposes a more efficient adaptive playout scheduling approach forVoIP application,

• it proposes a low-complexity and effective packet-loss concealment ap-proach for VoIP application and

• it provides insight in the performance limits for some scenarios ofmultiple description coding.

The thesis consists of four research papers. I formulated the approachesin the papers and did all of the fundamental mathematics: the co-authorsmainly provided advise and experimental data. Short summaries of thepapers are presented below.

Paper A: Adaptive Playout Scheduling for VoIP: Event TriggeredControl Policy

In paper A, we propose a new adaptive-playout scheduling approach for theapplication of Voice over IP. Differently from the literature, we use Wiener

4 Summary of Contributions 25

process to model the fluctuation of the buffer length in the absence of anyadjustment. The advantage of the Wiener process modelling is that it canmodel various packet-network delay statistics reliably, including the situa-tion when delay-spikes happen. The problem of adaptive-playout schedulingis then formulated as a stochastic impulse control problem. The adjustmentof the buffer length is taken to be the control signal in the new control prob-lem. Specifically, the control signal consists of length units that correspondto inserting or dropping a pitch cycle from the buffer. The control problempenalizes both the buffer length and the control signal (or the adjustment)in terms of a Lagrange cost function. A so-called band control policy isrigorously shown to be optimal for this control problem. The band controlpolicy maintains the buffer length within a band region by imposing impulsecontrol (inserted or dropped pitch cycles) whenever the bounds of the bandare reached. Experiments performed on both synthetic and real network-delay traces show that the proposed playout scheduling scheme outperformstwo recent algorithms in most cases.

Paper B: Autoregressive Model-Based Speech Packet-Loss Con-cealment

In paper B, we study packet-loss concealment for speech based on autore-gressive modelling using a rigorous minimum mean square error (MMSE)approach. Addressing the fact that most reported methods are designedin a heuristic manner, the main aim of the paper is to consider the PLCproblem rigorously. We expect the obtained result to provide a foundationfor understanding the functionality of the heuristic methods. In particular,we investigate the effect of the model estimation error on predicting themissing segment and derive an upper bound on the mean square error ofthe missing segment on a sample basis. Our experiments show that theupper bound is tight when the estimation error is less than the signal vari-ance. We also consider the usage of perceptual weighting on prediction toimprove the perceived speech quality. A rigorous argument is presented,showing that perceptual weighting is not useful in this context. Subjec-tive quality comparison tests show that the proposed MMSE-based systemprovides state-of-the-art performance.

Paper C: High-Rate Analysis of Symmetric L-Channel MultipleDescription Coding

In paper C, we study the tight rate-distortion bound for symmetric L-description coding of a vector Gaussian source with two levels of receivers.Each of the first-level receivers receives κ (κ < L) out of the L descriptions.The second-level receiver obtains all L descriptions. The rate-distortionbound takes a complex form that, in general, is inconvenient as a refer-

26 Introduction

ence to calculate the performance loss of practical MD schemes. The mainaim of this work is to derive a simple but good approximation of the rate-distortion bound, which can serve as a useful tool to evaluate the efficiencyof practical L-description schemes. We find that when the theory is appliedto the scalar Gaussian source, the product of a function of the side distor-tions (corresponding to the first-level receivers) and the central distortion(corresponding to the second-level receiver) is asymptotically independentof the redundancy among the descriptions. Using this property, we assessthe performance loss of a practical multiple-description lattice vector quan-tizer (MDLVQ) as an example. Another contribution of the work is thatwe provide a new geometric analysis of the performance of the consideredMDLVQ, which results in an expression for the side distortions using thenormalized second moment of a sphere of higher dimensionality than thequantization space.

Paper D: Bounding the Rate Region of Vector Gaussian MultipleDescriptions with Individual and Central Receivers

In paper D, we study the rate region of the vector Gaussian multiple descrip-tion problem with individual and central quadratic distortion constraints.Although the rate region for the considered scenario is known for a scalarGaussian source, the expression involves a min-max optimization problem,which is complicated. For a vector Gaussian source, the rate region is un-known. In this work, we derive an outer bound to the rate region of thevector Gaussian L-description problem. The bound is obtained by lowerbounding a weighted sum rate for each supporting hyperplane of the rateregion, of which the expression only involves a maximization problem. Theouter bound is tight when the data source is a scalar Gaussian source. Inthis case, the optimal weighted sum rate for each supporting hyperplaneis obtained by solving a single maximization problem. This contrasts withexisting results, which require solving a min-max optimization problem.

References

[1] S. Aign and K. Fazel. Temporal and spatial error concealment tech-niques for hierarchical MPEG-2 video codec. In Proceedings IEEEInternational Conference on Communications, volume 3, pages 1778–83, 1995.

[2] R. Aravind, M. Civanlar, and A. Reibman. Packet Loss Resilience ofMPEG-2 Scalable Video Coding Algorithms. IEEE Trans. Circuitsand Systems for Video Technology, 6(5):426–435.

[3] L. Atzori, M. L. Lobina, and M. Corona. Playout Buffering of Speech

References 27

Packets Based on a Quality Maximization Approach. IEEE Trans.Multimedia, 8(2):420–426, 2006.

[4] R. Balan, I. Daubechies, and V. Vaishampayan. The Analysis andDesign of Windowed Fourier Frame Based Multiple Description Sourcecoding Schemes. IEEE Trans. Inform. Theory, pages 46(7):2491–2536,2000.

[5] J. C. Batllo and V. A. Vaishampayan. Asymptotic Performance ofMultiple Description Transform Codes. IEEE Trans. Inform. Theory,pages 43(2):703–707, 1997.

[6] T. Berger. Rate Distortion Theory, a Mathematical Basis for DataCompression. Prentice-Hall, 1971.

[7] T. Berger and J. D. Gibson. Lossy Source Coding. IEEE Trans.Inform. Theory, pages 44(6):2693–2723, October 1998.

[8] T. Y. Berger-Wolf and E. M. Reingold. Index Assignment for Mul-tichannel Communication Under Failure. IEEE Trans. Inform. Th.,48:2656–2668, 2002.

[9] J. C. Bolot. Characterizing End-To-End Packet Delay and Loss in theInternet. Journal of High-Speed Networks, 2:305–323, 1993.

[10] J.-C. Bolot, S. Fosse-Parisis, and D. Towsley. Adaptive FEC-BasedError Control for Internet Telephony. In Proceedings of IEEE INFO-COM, pages 1453–1460, 1999.

[11] J. Cardinal. Entropy-Constrained Index Assignments for MultipleDescription Quantizers. IEEE Trans. Signal Proc., pages 52(1):264–270, 2004.

[12] J. Chen. Rate Region of Gaussian Multiple Description Coding WithIndividual and Central Distortion Constraints. IEEE Trans. Inform.Theory, 55:3991–4005, 2009.

[13] J. Chen, C. Tian, T. Berger, and S. S. Hemami. Multiple Descrip-tion Quantization via Gram-schmidt Orthogonalization. IEEE Trans.Inform. Theory, 52(12):5197–5217, 2006.

[14] P. A. Chou, S. Mehrotra, and A. Wang. Multiple Description Decod-ing of Overcomplete Expansions using Projections onto Convex Sets.In Proc. Data Compression Conf., 1999.

[15] W.-J. Chu and J.-J. Leou. Detection and Concealment of Transmis-sion Errors in H.261 Images. IEEE Trans. Circuits and Systems forVideo Technology, 8(1):74–84, 1998.

28 Introduction

[16] T. M. Cover and J. A. Thomas. Elements of Information Thoery.Wiley-Interscience, 2nd edition, 2006.

[17] I. Csiszar and J. Korner. Information Theory: Coding Theoresms fordiscrete Memoryless Systems. Academic Press, New York, 1981.

[18] S. N. Diggavi, N. J. A. Sloane, and V. A. Vaishampayan. Design ofAsymmetric Multiple Description Lattice Vector Quantizers. In Proc.Data Compression Conf., pages 490–499, 2000.

[19] S. N. Diggavi, N. J. A. Sloane, and V. A. Vaishampayan. AsymmetricMultiple Description Lattice Vector Quantizers. IEEE Trans. Inform.Theory, pages 48(1):174–191, 2002.

[20] G. Dan, V. Fodor, and G. Karlsson. Are Multiple Descriptions Betterthan one? In Proc. of IFIP/TC6 Networking 2005, pages 684–696,May 2005.

[21] G. Dan, V. Fodor, and G. Karlsson. Packet Size Distribution: anAside? In Proc. of QoSIP 2005, pages 75–87, Feb. 2005.

[22] G. Dan, V. Fodor, and G. Karlsson. A Rate-distortion Based Com-parison of Media-dependent FEC and MDC for Real-time Audio. InProc. of IEEE ICC 2006, June 2006.

[23] P. L. Dragotti, J. Kovacevic, and V. K. Goyal. Quantized OversampledFilter Banks with Erasures. In Proc. Data Compression Conf., 2001.

[24] U. Erez, S. Litsyn, and R. Zamir. Latttices Which are Good for(Almost) Everything. IEEE. Trans. Inform. Theory, 51:3401–3416,2005.

[25] J. Escobar, C. Partridge, and D. Deutsch. Flow Synchronization Pro-tocol. IEEE/ACM Trans. Networking, 2(2):111 – 121, 1994.

[26] A. Eskicioglu and P. Fisher. Image Quality Measures and Their Per-formance. IEEE Trans. Inform. Theory, 43:2959–2965, 1995.

[27] H. Feng and M. Effros. On the Rate Loss of Multiple DescriptionSource Codes. IEEE Trans. Inform. Theory, pages 51(2):671–683,2005.

[28] M. Fleming and M. Effros. Generalized Multiple Description VectorQuantization. In Proc. Data Compression Conf., 1999.

[29] M. Fleming, Q. Zhao, and M. Effros. Network Vector Quantization.IEEE Trans. Inform. Theory, pages 50(8):1584–1604, 2004.

References 29

[30] Y. Frank-Dayan and R. Zamir. Dithered Lattice-Based Quantizers forMultiple Descriptions. IEEE Trans. on Inform. Theory, 48(1):192–204, 2002.

[31] K. Fujimoto, S. Ata, and M. Murata. Statistical Analysis of Packet de-lays in the Internet and its Application to Playout Control for Stream-ing Applications. IEICE Trans. Communications, E84-B:1504–1512,June 2001.

[32] A. A. E. Gamal and T. M. Cover. Achievable Rates for MultipleDescription Quantizers. IEEE Trans. Inform. Th., IT-28:851–857,1982.

[33] J. F. Gibbon and T. D. C. Little. The use of network delay estimationfor multimedia data retrieval. IEEE Journal on Selected Areas inCommunications, 14(7).

[34] B. Girod, K. S. uller, M. Link, and U. Horn. Packet Loss ResilientInternet Video Streaming. In Proceedings of Visual Communicationsand Image Processing, volume 3653, pages 833–844, 1999.

[35] D. J. Goodman, G. Lockhart, O. Wasem, and W.-C. Wong. WaveformSubstitution Techniques for Recovering Missing Speech Segments inPacket Voice Communications. IEEE Trans. Acoustics, Speech, andSignal Processing, 34(6):1440–1448.

[36] V. Goyal, J. Kovacevic, and J. Kelner. Quantized Frame Expensionswith Erasures. Journal of Appl. and Comput. Harmonic Analysis,pages 10(3):203–233, May 2001.

[37] V. K. Goyal. Multiple Description Coding: Compression meets thenetwork. IEEE Signal Processing Magazine, 18:74–93, 2001.

[38] V. K. Goyal and J. Kovacevic. Generalized Multiple DescriptionsCoding with Correlating Transforms. IEEE Trans. Inform. Theory,47:2199–2224, 2001.

[39] V. K. Goyal, J. Kovacevic, and M. Vetterli. Multiple Description Tans-form Coding: Robustness to Erasures using Tight Frame Expansions.In Proc. IEEE Int. Symp. Information Theory, 1998.

[40] V. K. Goyal, J. Kovacevic, and M. Vetterli. Quantized Frame Expan-sions as Source Channel Codes for Erasure channels. In Proc. dataCompression Conf., 1999.

[41] V. K. Goyal, M. Vetterli, and N. T. Thao. Quantized OvercompleteExpansions in R: Analysis, Synthesis and Algorithms. IEEE Trans.Inform. Theory, pages 44(1):16–31, 1998.

30 Introduction

[42] R. M. Gray and D. L. Neuhoff. Quantization. IEEE Trans. Inf.Theory, 44(6):2325–2383, 1998.

[43] E. Gunduzhan and K. Momtahan. A Linear Prediction Based PacketLoss Concealment Algorithm for PCM Coded Speech. IEEE Trans.Speech Audio Process., 9:778–785, November 2001.

[44] P. Haskell and D. Messerschmitt. Resynchronization of motion com-pensated video affected by ATM cell loss. In Proceedings IEEE In-ternational Conference on Acoustics, Speech and Signal Processing,volume 3, pages 545–8, 1992.

[45] S. S. Hemami and T. H.-Y. Meng. Transform Coded Image Recon-struction Exploiting Interblock Correlation. IEEE Trans. Image Pro-cessing, 4(7):1023–1027, 1995.

[46] X. Huang and X. Wu. Optimal Design of Multiple Description LatticeVector Quantizers. submitted to IEEE Trans. Inform. Theory, 2006.

[47] D. A. Huffman. A Method for the Construction of Mimimum Redun-dancy Codes. In In Proc. IRE, pages 40:1098–1101, 1952.

[48] E. T. S. Institute. Specification and measurement of speech trans-mission quality; part 1: Introduction to objective comparison mea-surement methods for one-way speech quality across networks. ETSIGuide, EG 201 377-1 V 1.1.1, April 1999.

[49] ITU-T Rec. G.107. The E-Model, a Computational Model for Use inTransmission Planning, 2000.

[50] J. Wolf and A. Wyner and J. Ziv. Source Codingfor Multiple De-scriptions. Bell system Technical Journal, pages 59(8): 1417–1426,1980.

[51] H. Jafarkhani and V. Tarokh. Multiple Description Trellis-CodedQuantization. IEEE Trans. Communications, pages 47(6):799–803,1999.

[52] M. Kalman, E. Steinbach, and B. Girod. Adaptive Playout for Real-Time Media Streaming. In IEEE International Symposium on Circuitsand Systems, volume 1, pages I–45–8, 2002.

[53] M. Kalman, E. Steinbach, and B. Girod. Rate-Distortion OptimizedVideo Streaming with Adaptive Playout. In IEEE ICIP, volume 3,pages 189–192, 2002.

References 31

[54] M. Kalman, E. Steinbach, and B. Girod. Adaptive Media Playoutfor Low Delay Video Streaming over Error-Prone Channels. IEEETransactions on Circuits and Systems for Video Technology, 14:841–851, 2004.

[55] J. C. Kieffer. A Survey o the Theory of Source Coding with a FidelityCriterion. IEEE Trans. Inf. Theory, pages 39(5):1473–1490, 1993.

[56] M. Y. Kim and W. B. Kleijn. Rate-distortion comparisons betweenFEC and MDC based on Gilbert channel model. In Proc. IEEE Int.Conf. Networks, pages 495–500, 2003.

[57] B. Kleijn. A Basis for Source Coding. 2007.

[58] J. Klejsa, M. Kuropatwinski, and W. B. Kleijn. Adaptive Resolution-Constrained Scalar Multiple-Description Coding. In Proc. ICASSP-2008, 2008.

[59] K. Kondo and K. Nakagawa. A Speech Packet Loss ConcealmentMethod Using Linear Prediction. IEICE Trans. Inf. Syst., E89-D(2):806–813, February 2006.

[60] J. Kovacevic, P. L. Dragotti, and V. K. Goyal. Filter Bank Frame Ex-pansions with Erasures. IEEE Trans. on Inform. Theory, 48(6):1439–1450, 2002.

[61] W.-M. Lam and A. R. Reibman. An Error Concealment Algorithmfor Images subject to Channel Errors. IEEE Trans. Image Processing,4(5):533–542, 1995.

[62] W. M. Lam, A. R. Reibman, and B. Liu. Recovery of lost or erro-neously received motion vectors. In Proceedings IEEE InternationalConference on Acoustics, Speech and Signal Processing, volume 5,pages 417–20, 1993.

[63] L. Lamont, L. Li, R. Brimont, and N. D. Georganas. Synchronizationof multimedia data for a multimedia news-on-demand application.IEEE Journal on Selected Areas in Communications, 14(1), 1996.

[64] G. Langdon and J. Rissanen. A Simple General Binary Source Code(corresp.). IEEE Trans. Inform. Theory, pages 28(5):800–803, 1982.

[65] G. Langdon and J. Rissanen. Correction to ’a Simple General BinarySource Code’ (sep 82 800-803). IEEE Trans. Inform. Theory, pages29(5):778–779, 1983.

[66] G. G. Langdon. An Introduction to Arithmetic Coding. IBM Journalof Research and development, pages 28:135–149, 1984.

32 Introduction

[67] N. Laoutaris, B. V. Houdt, and I. Stavrakakis. Optimization of aPacket Video Receiver Under Different Levels of Delay Jitter: AnAnalytical Approach. Performance Evaluation Journal, 55(3-4):251–275, 2004.

[68] N. Laoutaris and I. Stavrakakis. An Analytical Design of OptimalPlayout Schedulers for Packet Video Receivers. Computer Communi-cations, 26(4):294–303, 2003.

[69] S. Lee and P. Lee. Cell Loss and Error Recovery in Variable RateVideo. Journal of Visual Communication and Image Representation,4:39–45, 1993.

[70] H. Li, G. Zhang, and W. B. Kleijn. Playout Scheduling for VoIP Usingk-Erlang Distribution. In the European Signal Processing Conference,2010.

[71] Y. Li, A. Markopoulou, J. Apostolopoulos, and N. Bambos. Content-Aware Playout and Packet Scheduling for Video Streaming over Wire-less Links. IEEE Trans. Multimedia, August 2008.

[72] Z. Li, S. Zhao, X. Xie, and J. Kuang. An Improved Speech PlayoutBuffering Algorithm Based on a New Version of E-Model in VoIP. InProc. ChinaCom, 2008.

[73] Y. J. Liang, N. Farber, and B. Girod. Adaptive Playout Schedulingand Loss Concealment for Voice Communications over IP Networks.IEEE Trans. Multimedia, 5(4):532–543, 2003.

[74] C. Liu, Y. Xie, M. J. Lee, and T. N. Saadawi. Multipoint multimediateleconference system with adaptive synchronization. IEEE Journalon Selected Areas in Communications, 14(7), 1996.

[75] E. Liu, G. Shen, and et al. Self-adaptive jitter buffer adjustmentmethod for packet-switched network. US patent application publica-tion, US 2005/0058146A1, Mar. 2005.

[76] F. Liu, J. Kim, and C.-C. J. Kuo. Adaptive Delay Concealment ForInternet Voice Applications With Packet-Based Time-Scale Modifica-tion. In Proc. IEEE ICASSP, pages 1461–1464, 2001.

[77] T. Liu and P. Viswanath. An Extremal Inequality Motivated by Mul-titerminal Information-Theoretic Problems. IEEE Trans. on Inform.Theory, 53(5):1839–1851, 2007.

[78] S. B. Moon, J. Kurose, and D. Towsley. Packet Audio Playout DelayAdjustment: Performance Bounds and Algorithms. ACM/SpringerMultimedia Systems, 6:17–28, January 1998.

References 33

[79] M. Narbutt, A. Kelly, L. Murphy, and P. Perry. Adaptive VoIP Play-out Scheduling: Assessing User Satisfaction. IEEE Internet Comput-ing, 9:28–34, 2005.

[80] M. Narbutt and L. Murphy. VoIP Playout Buffer Adjustment UsingAdaptive Estimation of Network Delays. In in Proc. of InternationalTeletraffic Congress ITC-18, pages 1171–1180, 2003.

[81] A. Narula and J. Lim. Error concealment techniques for an all-digitalhigh-definition television system. In Proceedings of the SPIE, volume2094, pages 304–15, 1993.

[82] M. T. Orchard, Y. Wang, V. Vaishampayan, and A. R. Reibman.Redudancy Rate-Distortion Analysis of Multiple Description CodingUsing Pairwise Correlating Transforms. In Proc. IEEE Conf. on Im-age Proc., volume 1, pages 608–611, 1997.

[83] J. Østergaard. Multiple-Description Lattice Vector Quantization. PhDthesis, Delft University of Technology, 2007.

[84] J. Østergaard, J. Jensen, and R. Heusdens. n-Channel Entropy-Constrained Multiple-Description Lattice Vector Quantization. IEEETrans. Inform. Th., 52:1956–1973, 2006.

[85] L. Ozarow. On a Source-Coding Problem with Two Channels andThree Receivers. Bell system Technical Journal, 59:1909–1921, 1980.

[86] D. P. Palomar and S. Verdu. Gradient of Mutual Information in LinearVector Gaussian Channels. IEEE Trans. Inform. Theory, 52(1):141–154, 2006.

[87] A. Perkis, P. Svensson, O. I. Hillestad, S. Johnsen, J. Z. nd A. Sæbø,and O.Jetlund. Multimedia over IP Networks. Telektronikk, pages43–53, 2006.

[88] J. Pinto and K. Christensen. An algorithm for playout of packetvoice based on adaptive adjustment of talkspurt silence periods. InProceedings 24th Conference on Local Computer Networks, pages 224–231, 1999.

[89] S. S. Pradhan, R. Puri, and K. Ramchandran. n-Channel Symmet-ric Multiple Descriptions-part I: (n,k) Source-Channel Erasure Codes.IEEE Trans. Inform. Th., 50:47–61, 2004.

[90] R. Puri, S. S. Pradhan, and K. Ramchandran. n-Channel SymmetricMultiple Descriptions- part II: An Achievable Rate-Distortion Region.IEEE Trans. Inf. Theory, pages 51:(4):1377–1392, 2005.

34 Introduction

[91] M. S. Rahman and A. B. Wagner. Rate Region of the Gaussian Scalar-Help-Vector Source-Coding Problem. submitted to IEEE Trans. In-form. Theory, 2010.

[92] R. Ramjee, J. Kurose, D. Towsley, and H. Schulzrinne. AdaptivePlayout Mechanisms for Packetized Audio Applications in Wide-AreaNetworks. In Proc. of the IEEE Infocom, pages 680–688, 1994.

[93] J. Rissanen. Generalized Kraft Inequality and Arithmetic Coding.IBM Journal of Research and Development, page 20:198, 1976.

[94] C. A. Rødbro, M. N. Murthi, S. V. Andersen, and S. H. Jensen. Hid-den Markov Model-Based Packet Loss Concealment for Voice over IP.IEEE Trans. Audio Speech Language Process., 14:1096–1623, Septem-ber 2006.

[95] D. Sanghi, A. K. Agrawala, . Gudmundsson, and O. Gudmundsson.Experimental Assessment of End-to-end Behavior on Internet. InProc. IEEE INFOCOM, pages 867–874, 1993.

[96] H. Sanneck and N. T. L. Le. Speech Property-Based FEC for InternetTelephony Applications. In Proceedings of SPIE - The InternationalSociety for Optical Engineering, volume 3969, pages 38–51, 2000.

[97] H. Sanneck, A. Stenger, K. B. Younes, and B. Girod. A new techniquefor audio packet loss concealment. IEEE GLOBECOM, pages 48–52,1996.

[98] C. E. Shannon. A Mathematical Theory of Communication. Bell sys-tem Technical Journal, pages 27:379–423; 623–656, July and October1948.

[99] C. E. Shannon. Coding Theorems for a Discrete Source with a FidelityCriterion. IRE Conv., 7:142–163, 1959.

[100] C. J. Sreenan, J.-C. Chen, P. Agrawal, and B. Narendran. Delay Re-duction Techniques for Playout Buffering. IEEE Trans. Multimedia,2:88–100, 2000.

[101] E. Steinbach, N. Farber, and B. Girod. Adaptive Playout for Low-Latency Video Streaming. In IEEE International Conference on Im-age Processsing, 2001.

[102] A. Stenger, K. Younes, R. Reng, and B. Girod. A New Error Conceal-ment Technique for Audio Transmission with Packet Loss. In Proc.European Signal Processing Conference, volume 3, pages 1965–1968,1996.

References 35

[103] L. Sun and E. C. Ifeachor. Voice Quality Prediction Models and TheirApplication in VoIP Networks. IEEE Trans. Multimedia, 8:809–820,2006.

[104] L. Sun, G. Wade, B. Lines, and E. Ifeachor. Impact of packet losslocation on perceived speech quality. In Internet Telephony Workshop,2001.

[105] W. Tan and A. Zakhor. Video Multicast Using Layered FEC andScalable Compression. IEEE Transactions on Circuits and Systemsfor Video Technology, 11(3):373–87, 2001.

[106] C. Tian and S. Hemami. Universal Multiple Description Scalar Quan-tization: Analysis and Design. IEEE Trans. Inform. Theory, 50:2089–2102, 2004.

[107] C. Tian and S. S. Hemami. Optimality and Suboptimality of Multiple-Description Vector Quantization with a Lattice Codebook. IEEETrans. Inf. Theory, 50:2458–2468, 2004.

[108] C. Tian and S. S. Hemami. Sequential Design of Multiple DescriptionScalar Quantizers. In Data Compression Conference, 2004.

[109] C. Tian, S. Mohajer, and S. N. Diggavi. Approximating the Gaus-sian Multiple Description Rate Region Under Symmetric DistortionConstraints. submitted to IEEE Trans. Inf. Theory, 2008.

[110] V. Vaishampayan, N. Sloane, and S. Servetto. Multiple DescriptionVector Quantization with Lattice Codebooks: Design and Analysis.IEEE Trans. Inform. Theory, 47:1718–1734, 2001.

[111] V. A. Vaishampayan. Design of Multiple Description Scalar Quantiz-ers. IEEE Trans. Inform. Th., 39:821–834, 1993.

[112] V. A. Vaishampayan and J.-C. Batllo. Asymptotic Analysis of Multi-ple Description Quantizers. IEEE Trans. Inform. Theory, 44(1):278–284, 1998.

[113] V. A. Vaishampayan and J.-C. Batllo. On Reducing Granular Distor-tion in Multiple Description Quantization. In Proc. IEEE Int. Symp.Information Thoery, 1998.

[114] V. A. Vaishampayan and J. Domaszewicz. Design of Multiple Descrip-tion Scalar Quantizers. IEEE Trans. on Inf. Theory, 39(3):821–834,1993.

[115] V. A. Vaishampayan and J. Domaszewicz. Design of Entropy-Constrained Multiple-Description Scalar Quantizers. IEEE Trans.Inform. Theory, 40:245–250, 1994.

36 Introduction

[116] R. Venkataramani, G. Kramer, and V. K. Goyal. Bounds on theAchievable Region for Certain Multiple Description Coding Problems.In Proc. IEEE Int. Symp. Information Theory, page 148, June 2001.

[117] R. Venkataramani, G. Kramer, and V. K. Goyal. Multiple Descrip-tion Coding with Many Channels. IEEE Trans. Inform. Theory,49(9):2106–2114, 2003.

[118] H. Wang and P. Viswanath. Vector Gaussian Multiple DescriptionWith Individual and Central Receivers. IEEE Trans. Inform. Theory,53:2133–2153, 2007.

[119] H. Wang and P. Viswanath. Vector Gaussian Multiple Descriptionwith Two Levels of Receivers. IEEE Trans. Inform. Theory, 55:401–410, 2009.

[120] J.-F. Wang, J.-C. Wang, J.-F. Yang, and J.-J. Wang. A Voicing-DrivenPacket Loss Recovery Algorithm for Analysis-by-Synthesis PredictiveSpeech Coders over Internet. IEEE Trans. Multimedia, 3:98–107,March 2001.

[121] X. Wang and M. T. Orchard. Multiple Description Coding usingTrellis Coded Quantization. In IEEE Int. Conf. Image Processing,2000.

[122] Y. Wang, M. Orchard, V. Vaishampayan, and A. Reibman. MultipleDescription Coding Using Pairwise Correlation Transforms. IEEETrans. Image Processing, 10(3), 2001.

[123] Y. Wang, M. T. Orchard, and A. R. Reibman. Multiple DescriptionImage Coding for Noisy Channels by Pairing Transform Coefficients.In Proc. IEEE Workshop Multimedia Signal Processing, pages 419–424, 1997.

[124] Y. Wang, A. Reibman, M. Orchard, and H. Jafarkhani. An Improve-ment to Multiple Description Transform Coding. IEEE Trans. SignalProcessing, 50(11), 2002.

[125] Y. Wang and Q.-F. Zhu. Error control and concealment for videocommunication: a review. In Proceedings of the IEEE,, volume 86,pages 974–997, 1998.

[126] Y. Wang, Q.-F. Zhu, and L. Shaw. Maximally smooth image recov-ery in transform coding. IEEE Trans. Communications, 41(10):1544–1551, 1993.

References 37

[127] O. J. Wasem, D. J. Goodman, C. A. Dvorak, and H. G. Page. TheEffect of Waveform Substitution on the Quality of PCM Packet Com-munications. IEEE Trans. Acoustics, Speech, and Signal Processing,36(3):342–348, 1988.

[128] H. Witsenhausen. On Source Networks with Minimal BreakdownDegradation. Bell Systems Technical Journal, pages 59(6): 1083–1087, 1980.

[129] I. Witten, R. Neal, and J. Cleary. Arithmetic Coding for Data Com-pression. Communications of the ACM, pages 30(6):520–540, 1987.

[130] P. Yahampath. A multiple-description trellis quantizer for sourceswith memory. In IEEE Global Telecommunications Conference, De-cember 2006.

[131] L. Yamamoto and J. G. Beerends. Impact of network performanceparameters on the end-to-end perceived speech quality. In Proceedingsof Expert ATMTraffic Symposium, 1997.

[132] Z. Yang, X. Chen, J. Fu, and B. Heng1. The Audio and Video Syn-chronization Method Used in AVS System. Advances in MultimediaInformation Processing, 4810:71–77, 2007.

[133] R. Zamir. Gaussian Codes and Shannon Bounds for Multiple Descrip-tions. IEEE Trans. Inform. Theory, 45:2629–2636, 1999.

[134] R. Zamir. Shannon Type Bounds for Multiple Descriptions of a Sta-tionary Source. Journal of Combinatorics, Information and SystemSciences, pages 1–15, 2000.

[135] L. Zhang, L. Zheng, and K. S. Ngee. Effect of Delay and Delay Jit-ter on Voice/Video over IP. Computer Communications, 25:863–873,2002.

[136] Z. Zhang and T. Berger. New Results in Binary Multiple Descriptions.IEEE. Trans. Inf. Theory, IT-33(4):502–521, 1987.

[137] Z. Zhang and T. Berger. Multiple Description Source Coding with NoExcess Marginal Rate. IEEE Trans. Inform. Theory, pages 41(2):349–357, 1995.

[138] G. Zhou and Z. Zhang. Multiple Description Trellis-Coded VectorQuantization. In Proc. IEEE Globecom, pages 1414–1420, 2001.

张国强 - Divakth.diva-portal.org/smash/get/diva2:345510/FULLTEXT01.pdf · experimentally that...

Documents

Transcript of 张国强 - Divakth.diva-portal.org/smash/get/diva2:345510/FULLTEXT01.pdf · experimentally that...