[IEEE 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) - San Jose, CA,...

A PERCEPTUAL RATE-DISTORTION OPTIMIZATION APPROACH BASED ON PIECEWISELINEAR APPROXIMATION FOR VIDEO CODING

Zheng Yuan∗, Dongqing Zhang†, Dapeng Wu∗ and Heather Yu†

∗ Dept. of ECE, University of Florida † Huawei Technologies Co. Ltd. USA

ABSTRACT

The core question in Perceptual Rate Distortion Optimizationis how to find the Lagrange multiplier λ, which essentiallyrequires the construction of a perceptual quality based RDmodel. A compelling RD modeling expects that the modelreflects the best achievable Rate Distortion trade-off and cap-tures the RD behavior with high accuracy. To this end, werescale the λ associated with MSE-RDO to match the dy-namic range of the perceptual distortion and populate a fam-ily of RD samples from Perceptual-RDO with its Lagrangemultiplier as offsetting the rescaled λ. Considering that theenvelope curve that encloses all the RD samples depicts thebest achievable RD bound, we propose a piecewise linear ap-proximation approach (since there is no guarantee that theRD model can be accurately fitted by a single function e.g.D = aebR, D = aRb ) to represent it. Each segment inthe piecewise line is obtained with fair computational com-plexity: fitting the RD samples given the same QP with circlecurve and finding the common tangent line over two adjacentcircle curves. Experiments show that our proposed approachcan reduce bitrate by 2-5% given the same perceptual distor-tion than the conventional approach.

Index Terms— Perceptual RDO, RD modeling, La-grange multiplier, Video Coding

1. INTRODUCTION

The question to find the best trade-off between rate and dis-tortion is defined as rate-distortion optimization (RDO). Thecore aspects in RDO are finding the best encoding mode andchoosing the quantization parameter (QP) for each block. Incurrent encoder configuration, a QP is given before findingthe best mode[1]. Then based on the best mode, a QP can bereevaluated and possibly adjusted for a second pass encoding.Therefore, the RDO can be simplified into such mathematicalform,

ˆmode = arg minmode

D(mode|QP )

s.t.R(mode|QP ) ≤ Rc

(1)

where D and R(mode|QP ) are the distortion of a block dueto compression and the bit rate consumed for encoding. Rc is

total bitrate budget for this block. This constrained optimiza-tion problem is reformatted into an unconstrained optimiza-tion problem using Lagrange method [2],

minmode

J = D(mode|QP ) + λR(mode|QP ) (2)

where ▽D(mode|QP ) = −λ▽R(mode|QP ), λ > 0.

λ = −∂D(QP )

∂R(QP )(3)

As shown in Eq. 2, the core question in the optimization is tochoose a suitable λ, which requires the knowledge of the ana-lytical form of D in terms of R. Throughout the encoder gen-erations, Mean Square Error (MSE) is conventionally used asthe distortion measure for its simplicity. Based on the R-D(asMSE) models, the negative derivative with respect of R is as-signed to λ as in Eq. 3. Specifically, in the H.264/AVC codingconfiguration, λ is suggested as 0.85× 2

QP−123 .

Meanwhile, it is reported that MSE does not necessar-ily correlate with human visual characteristic (HVC) verywell. Thus, many perceptual quality based distortion metric,such as SSIM [3] are proposed, each of which claims certainHVC are captured and incorporated into the proposed metric.Therefore, if an encoder employs perceptual quality metricinstead of MSE into its RDO framework, it is estimated thata better RD trade-off can be achieved. The intuition is thatperceptual quality metric distinguishes the certain distortionaspects that human visual system is most sensitive of, thusRDO can arrange bit budgets more wisely to accommodatethe aspects, while signal level metric such as MSE does notconsider. In this paper, we use SSIM in our RDO framework,

SSIM(x, y) = l(x, y)× c(x, y)× s(x, y) (4)

When perceptual quality distortion metric is used in RDO, itis intuitive to replace the D in terms of MSE with the percep-tual quality metric in Eq. 4. However, the challenge is thatthe Lagrange multiplier λ should be changed accordingly. Inliterature, two types of methodology are proposed to predictthe perceptual RDO λ. The first one is experiment-based ap-proach [4][5]. The authors in [4] first perform many encod-ing processes with MSE and perceptual quality based RDO,respectively. Based on the visualization of the RD samples

resulted from the encoding, they assume that the RD curveby the perceptual quality based RDO parallels with that bythe MSE based RDO and adopt λ predicted by the MSE RDOinto the perceptual quality based RDO. However, the assump-tion is not usually the case, as shown in Section 2.1 . Theother approach is information-theory based approach [6][7].The authors represent the perceptual quality (SSIM) in termsof QP by the physics of the perceptual quality metric, togetherwith the rate in terms of QP and eventually derive the relationof D(SSIM) with rate. However, due to many perceptual qual-ity metric are in non-parametric fashion [8], it is unrealistic torelate them with QP in an analytical fashion.

This paper proposes a framework for perceptual qualitymetric based RDO in video encoding. The proposed method,as a type of experiment based approach, produces a RD modelassociated with a perceptual quality distortion metric. TheRD model is based on the best achievable RD trade-off andthus can give the best RD performance. The authors first col-lect RD samples by running RDO with a set of Lagrange mul-tiplier λ, where λ is enumerated in a way that its best value (tobe found) is included. Then based on the visualization of RDsamples on the RD plane, the envelop curve that encloses allthe RD samples is recognized as the best achievable RD curvefor the perceptual quality metric. Finally the envelope/desiredRD model is fitted using piecewise lines. The framework in-cludes five modules, RD sampling, local RD curve fitting andpiecewise envelope generation, as shown in Fig. 1.

Fig. 1. the block diagram of the perceptual RDO system

In the RDO framework, a video frame is first input intothe perceptual RD modeling unit for training purpose. Oncethe RD model learned, the best λ is derived as the negativederivative of D with respect to R. Then the RDO processesfor the following frames are initiated. Since the video charac-teristics tend to have high correlations over a period of time,the following frames can use the learned RD model to choosetheir own desired λ, without additional computational load.

For the perceptual metric SSIM, this paper gives an exem-plar RD modeling solution. The solution can be extended toother non-parametric blockwise metrics as well since it doesnot require the knowledge of how the metric is calculated. Asshown in Eq. 2, running the RDO process for an encodingunit depends on a given QP. Therefore, RD sample points forthe same QP but different λssim represent the local RD be-havior (a local RD curve) under a particular QP. In the localRD curve fitting module, each local RD curve is fitted using

a quadratic curve (circle for SSIM) with least square regres-sion. For the family of local RD curves over different QP, theyspan an envelope that closely enclose them, which is appar-ently the RD bound of the perceptual quality metric. For ev-ery two adjacent RD curves, a common tangent line is derivedin the Envelope Generation module to capture the gradient ofthe envelope at that location. Note that all the tangent linesegments form a piece-wise approximation of the envelope.

2. PERCEPTUAL RDO FRAMEWORK BYPIECEWISE LINEAR APPROXIMATION

2.1. RD Sampling

Notice that the conventional MSE based RDO is able to pro-duce visually pleasant compression (blue line in Fig. 2). Thusits RDO framework can be used as a starting point for findingthe best λssim for the perceptual quality based RDO. We pro-pose a λ rescaling method to utilize the MSE based RDO. ForSSIM based RDO, we rescale the λ of the MSE RDO into avalue that matches the dynamic range of DSSIM . Eq. 5 showsthe two RDO frameworks. After encoding blocks using theexisting MSE-RDO, we get the statistics of the average MSEmetric and SSIM-distortion metric of a block. Their ratio isapplied to scale the SSIM-RDO Lagrange multiplier.

minmode

DMSE(mode|QP ) + λMSER(mode|QP )

minmode

DSSIM (mode|QP ) + λSSIMR(mode|QP )(5)

where λssim =¯Dssim¯Dmse

× λmse

In this configuration, each mode produces similar RDtrade-off for two RDO frameworks so that similar modesshould be chosen for them both. Thus the λ rescaling methodhas comparable RD performance with MSE based RDO andalso produces visually pleasant compression (purple line inFig. 2). This fact suggests that the best λssim for SSIM basedRDO should be in the neighborhood of the rescaled λmse.

In order to include the best λssim, we vary the λssim inEq. 5 with offset interval from -30% - 200% and performthe perceptual quality based RDO. For each QP, multiple RDsample points are generated, each one (black sample pointsin Fig. 2) corresponds to offsetting λssim. They compose alocal RD curve that describes the RD behavior of a given QP.Also, over different QP, a family of local RD curves is gener-ated to reflect the global RD behavior. Its envelope (red linein Fig. 2) on the lower left side describes the best achievableRD behavior since each RD points in the interior region isat least worse than two RD points (its horizontal and verticalprojections) on the boundary. Therefore, the envelope corre-sponds to the desired RD model. As shown in the figure, thebest RD curve based on perceptual quality metric RDO doesnot necessarily parallel with that of the MSE based RDO [4].

Fig. 2. the RD samples by varying λ for video bus. Blackmarker in the same shape: samples for a given QP but varyingλ. Black marker in different shape: samples for different QP.Blue line: RD curve resulted from MSE based RDO. Purpleline: RD curve resulted from perceptual RDO with scalingλ associated with MSE-RDO. Red line: RD bound of bestachievable perceptual RDO, enclosing all RD samples

Fig. 3. RD samples over different QP and varying λ, videobus and mobile

2.2. local RD curve fitting

Fig. 3 shows the RD samples by running multiple SSIM-based RDO on video sequence bus and mobile. The samplepoints with the same marker belong to the same QP but vary-ing λssim. The samples points with different markers corre-spond to different QP. Based on the visualization of the RDsamples, we use a quadratic model (circle is the simplest) to fitthe local curve (RD samples with the same marker). The func-tional form of a circle is as in Eq. 6, where (R, D) are availableRD samples, c, d and e are coefficients of the quadratic curveto fit. We perform least square regression to solve the coeffi-cients as in Eq. 7.

fc(R,D|c, d, e) = R2 +D2 + c×R+ d×D + e (6) · · · · · · · · ·Ri Di 1· · · · · · · · ·

cde

=

· · ·−R2

i −D2i

· · ·

⇐⇒ Ax = b ⇐⇒ x = (A′A)−1A′b

(7)

Fig. 4 shows the performance of the circle curve (the blueline) fitted from the local RD samples (The black markers).

Fig. 4. the fitted model of local RD samples (the same QP,varying λ) using circle for video bus, QP = 23, 26

2.3. Piecewise Envelope Generation

We propose a piece-wise approximation method to fit theglobal RD envelope. The idea is that the global RD envelopecan be approximated by a family of piece-wise line segments(blue line in Fig. 5), each of which is on the common tangentline of two neighboring local RD curves. Since the local RD

Fig. 5. The piece wise approximation of RD envelope by tan-gent line segments

curve belongs to a circle, we use the following procedure tofind the common tangent line of two circles. Suppose the twocircles have the following form,

(x− x1)2 + (y − y1)

2 = r21

(x− x2)2 + (y − y2)

2 = r22(8)

where (x1, y1) and (x2, y2) are the centers of the two cir-cles, r1 and r2 are the radius of the two circles. Supposeits common tangent line is a × x + b × y + c = 0, thena = RX − kY

√1−R2, b = RY + kX

√1−R2, c =

r1(ax1 + by1), where X = x2−x1√(x2−x1)2+(y2−y1)2

, Y =

y2−y1√(x2−x1)2+(y2−y1)2

, R = r2−r1√(x2−x1)2+(y2−y1)2

.

Based on the fitted local curves in Section 2.2 and the pro-cedure above, we obtain the common tangent line for everytwo neighboring local RD curves. Fig. 6 shows the tangentline over two local RD curves. The tangent line demonstratesthe gradient of global RD envelope at QP of the two neigh-boring local RD curves. Therefore, the family of tangent lines

Fig. 6. the common tangent line of two local RD curves (cir-cle segment) left QP = 26 right QP = 28

forms an approximation of the global RD envelope. The in-tersections of every two neighboring tangent lines are on thepiecewise envelope. As shown in Fig. 8, the red markers aresuch intersection points and they are well positioned on theenvelope that encloses all RD samples (black markers).

Fig. 7. Piecewise linear approximation of the RD envelope.left: bus right: coastguard

3. EXPERIMENTS

We compare the RD performance of our proposed perceptualRDO with [4] and baseline JM 16.0 [1]. In the experiment,we use H.264 baseline profile and the encoding configura-tions are GOP structure IPPP with 12 frames, enable RDO,maximum 3 reference frames for inter coding and QP from20 to 36. Our proposed framework is implemented in JM16.0 and we test five video sequences in cif size. The perfor-mance is based on inter-frame coding. As shown in Fig. 8,our method outperforms both [4] and the baseline JM. In thehigh bitrate range, our method can save 2% and 12% bitratefrom the two reference methods. In the low bitrate range, ourmethod has larger saving margin, with 5% and 15% bitratereduction. Table. 1 shows that the bitrate reduction rate of ourproposed method comparing with the two reference methodsfor all tested video sequences.

4. CONCLUSIONS

In this paper, a perceptual RDO framework based on piece-wise linear approximation is proposed. We start from theMSE-RDO and rescale and offset its Lagrange multiplier to

Fig. 8. Perceptual RDO performance comparison for inter-frame coding. red: the proposed RDO, cyan: the frameworkin [4], blue: JM. MSE based RDO

Table 1. bitrate reduction (%) of our proposed RDO for inter-frame coding under Baseline profile

Videos QP 28-36 QP 20-27

ref-[4] ref-JM ref-[4] ref-JMforeman -5.42 -15.29 -3.84 -12.2

coastguard -6.97 -12.56 -2.27 -9.10bus -9.78 -16.6 -3.86 -7.59

mobile -7.63 -11.21 -3.58 -8.42akiyo -6.25 -10.63 -2.61 -9.86

suit for the dynamic range of perceptual distortion in our per-ceptual RDO. Based on the collected RD samples, we find theenvelope curve correspond to the best achieve RD model. Wethen approximate the RD envelope with piecewise line seg-ments, each segment is from a common tangent line of twocircles fitted from RD samples. Experiments illustrate thatour proposed RDO featuring RD envelope approximation canoutperform the conventional methods by 2% to 5%.

5. REFERENCES

[1] “http://iphome.hhi.de/suehring/tml/,” .

[2] Dimitri B., “Constrained optimization and lagrange mul-tiplier methods,” 1982, Computer Science and AppliedMathematics, Boston.

[3] Wang Z. and et al., “Image quality assessment: From er-ror visibility to structural similarity,” IEEE Trans. ImageProcessing, 2004.

[4] Huang Y. and et al, “Perceptual rate-distortion optimiza-tion using structural similarity index as quality metric,”IEEE Trans. CSVT, 2010.

[5] Wang C., “A perceptual quality based rate distortionmodel,” in Quality of Multimedia Experience. 2012.

[6] Wang S. and et al, “SSIM-Motivated Rate-Distortion Op-timization for Video Coding,” IEEE Trans. CSVT, 2012.

[7] Wang X. , “Visual perception based lagrangian rate dis-tortion optimization for video coding,” IEEE ICIP, 2011.

[8] Kanumuri S. and et al, “Modeling packet-loss visibilityin MPEG-2 video,” IEEE Trans. Multimedia, 2006.

[IEEE 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) - San Jose, CA,...

Documents

Transcript of [IEEE 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) - San Jose, CA,...