NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board 陳奕安 2008.02.13...

41
NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board 陳陳陳 2008.02.13 1

Transcript of NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board 陳奕安 2008.02.13...

Page 1: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Implementation of H.264 Based System on Multi-DSPs Board

陳奕安 2008.02.13

1

Page 2: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Outline

System descriptionArchitectureMEX BoardTMSDM642

Communication interface Software development Error resilience

2

Page 3: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

PC 2

Architecture

MEX Board 2

MEX Board 1

CaptureFrameCaptureFrame

H.264 EncodeH.264 Encode

Send to NetworkSend to Network

DisplayDisplay H.264 DecodeH.264 Decode

Receive from Network

Receive from Network

PC 1

PC 2

3

Page 4: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

MEX Board

MEX board is composed of : 4 DSP TMS320DM642 for data stream

compression (video/audio) and its memory. 2 FPGA for flexible architecture 8 video chips SA6711H(ADC) 44 audio stereo chip CS4221(ADC audio stereo chip CS4221(ADC)

4

Page 5: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

MEX Board

4 DM6424 DM642

22 FPGAFPGAVideo/Audio ChipVideo/Audio Chip

Block Diagram of MEX board[1] 5

Page 6: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

MEX Board Block Diagram

Block Diagram of MEX board[1]

6

Page 7: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

TMS320DM642 TMS320DM642

Performance : 4000-4800 MIPSTwo Level Cache :

○ L2: 256 KB, L1P: 16 KB, L1D: 16 KB3 Video Ports8-Bit McASP Ethernet MAC32-Bit HPI66 MHz PCI64-Bit EMIF

DSP DM642 block diagram[2]7

Page 8: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

TMS320DM642

Peripherals will be used: Enhanced DMA (EDMA)Video ports (VP0~VP2)Inter-integrated circuit (I2C) busExternal memory interface (EMIF)Ethernet media access controller(EMAC)Management data input/output (MDIO)

8

Page 9: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Outline

System description Communication interface

Host/ MEX CommunicationVideo capturing/ DisplayingNetwork Transmit

Software development Error resilience

9

Page 10: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

PC

MEX

Host/ MEX Communication

DSP started : fill memory

Initializetransfer

DSP to PCItransferrequest

Start TransferTransferfinished

Set DSP FIFO DirectionSet FIFO Full Flag valueDSP FIFO is reset

Start EDMAUnreset DSP1 FIFOClear PCI Interrupt

PCI started :wait for interrupt

Initializetransfer

PCI to DSPstart transferrequest

Wait fortransferfinished

Transferfinished

Set transfer sizeSet PCI FIFO directionSelect DSP data sourcesSet transfer destinationaddressStart PCI FIFOClear DSP Interrupt

10Data transfer from the 4 DSP (SDRAM) to PCI [1]

Page 11: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Video Capture

Camera

MEX Board

Video ChipSAA7113H

(ADC)

DM642

VP0

VP1

VP2

ITU656 : Digital / for PAL or NTSC

Raw Data

DMA

NTSC : Analog / 525-line per frame / 30 frames per secondor PAL : Analog / 625-line per frame / 25 frames per second

I2C BUS

11

Page 12: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

TMS320DM642 Video Port

12[3]

Page 13: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Network ArchitectureMEX Board 1

PHYLXT971ALC

DM642

EMAC

MDIO

MEX Board 2

PHYLXT971ALC

DM642

EMAC

MDIO

RJ45

13

Page 14: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

TMS320DM642 EMAC DM642 Networking Using EMAC and MDIO

14DM642 Networking [4]

Page 15: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Outline

System description Communication interface Software development

H.264 CodecOptimizationParallelizationMemory Issue

Error resilience

15

Page 16: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

H.264 Encoder Block Diagram

Fn

(current)

F'n-1

(reference)

ME

MC

Intraprediction

ChooseIntra

predition

F'n(reconstructed)

Filter

T Q ReorderEntropyencode

T -1 Q -1

Dn X

D'n

Inter

Intra

P

uF'n

NAL

(1 or 2 previouslyencoded frames)

16

Page 17: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

H.264 Decoder Block Diagram

17

F'n-1(reference)

F'n(reconstructed)

MC

Intraprediction

Filter T -1 Q -1 ReorderEntropyencode

X D'n

P

Inter

Intra

uF'nNAL

Page 18: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Optimization on Single Chip

Realization and Optimization of DSP Based H.264 Encoder [5]

Optimization of H.264 on DSP platformCode transplant and primary optimizationOptimization of the key module Using TI C64x IMAGLIB

Data scheduling and storage allocation Data scheduling with EDMAStorage allocation (Code section/Data section)

Page 19: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Parallelization on Chips

One GOP in one DSPEach DSP handles IPPP… or IBBPBB... .

No dependences are between group of pictures (GOPs).

One Frame / One macroblck in one DSPEach DSP handle one frame or one macroblock.

Dependences are between frames and macroblocks.

19

Page 20: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Macroblock Dependencies Data dependencies induced by inter-prediction:

Motion vector MVcur are predicted from MVA~D

20

MVD MVB MVC

MVA MVcur

Reference frame

Current frame

Data dependencies induced from MV prediction [6]

Page 21: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Macroblock Dependencies Data dependencies induced by intra-prediction: Left, upper-left, upper, and upper-right MBs

Data dependencies induced from intra prediction [6]

21

Page 22: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Macroblock Dependencies Data dependencies induced by deblocking

filter:Top 4 rows of pixels and leftmost 4 columns

22

Data dependencies induced from deblocking filter [6]

Page 23: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Intra Pred.MV Pred.

Intra Pred.MV Pred.DeblockingFitler

Intra Pred.MV Pred.

Intra Pred.MV Pred.DeblockingFitler

Current MB

Macroblock Dependencies

23

Possible spatial data dependencies for a macroblock

Possible spatial data dependencies for a macroblock [6]

Page 24: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Macroblock Dependencies Macroblock Dependencies:

Data dependencies between framesData dependencies between MB rows in the

same frameData dependencies in the same MB row

24

Page 25: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Wave-front parallelization Partition for MB region

Wave-front of Macro-block Region Partition [7]

25

Page 26: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Wave-front parallelization

Wave-front of Frame Partition [7]

26

Partition for frames

Page 27: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Memory Issue

27

L1P Cache Direct Mapped 16Kbytes Total

DM642 DSP Core

L1D Cache 2-way Set Associated 16Kbytes Total

L2 Cache/ M

emory

256Kbytes T

otal

Two-level cache architecture of DM642

ED

MA

Controller

peripherals Limited memory of DM642 Use memory buffer to reduce memory access

Page 28: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Memory Issue

Memory hierarchy for inter prediction

28

Memory hierarchy [8]

Page 29: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Memory Issue

Slice memory buffer for intra prediction and deblocking filter

Slice Memory [9]

29

Page 30: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Outline

System description Communication interface Software development Error resilience

Error-Resilience Tools in H.264/AVCError resilience of JM source code

30

Page 31: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Error Resilience Tools in H.264/AVC Redundant slices (RSs) [10]

For a MB, an encoder can place redundant representation of the same MBs into the same it stream.

e.g.○ One slice is coded using different quantization parameter

(QP).○ If the slice of low QP is available, the decoder discards the

RS; otherwise, the RS is reconstructed by the decoder

Slice AQP1

Slice AQP2

Decoder

Page 32: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Parameter sets [10]Including picture size, entropy coding method, MV

resolution, and so on.Sequence parameter set (SPS)

○ Containing all information related to the picture sequence between two IDR (Instantaneous Decoding Refresh ) pictures.

Picture parameter set (PPS)○ Containing all information related to all slices in a

picture.e.g. Sending multiple copies of SPSs so to

enhance the arrival rate.e.g. SPSs can be sent out-of-band.

Error Resilience Tools in H.264/AVC

Page 33: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Error Resilience Tools in H.264/AVC Flexible macro-block ordering (FMO) [10]

7 modes Overhead bits highly depends on the picture format, the

content, and the QP. ○ < 5% penalty at QP = 16; on average 20% at QP = 28.

6 modes of FMO [10]

Page 34: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision LabError Concealment  of H.264/AVC Error concealment scheme provided in JM

Intra

Inter○ }|)mv(|{

1},,,{

argmin NYYdN

j

OUTj

INj

dirsm

rightleftbottopdir

Error concealment for macro-blocks [11]

Page 35: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Future Work

Optimization the H.264 codec for real time

Implementation of different concealment methods

Proposed corresponding error resilience methods

Page 36: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

Reference [1] VITEC MULTIMEDIA, “MEX User manual Revision 1.7”. [2] Texas Instruments, Incorporated “TMS320C64x DSP Generation Product Bulletin” (sprt236) [3] Texas Instruments, Incorporated “TMS320DM64x Video Port to Video Port Communication.”

(spraaf3) [4] Texas Instruments, Incorporated “TMS320C6000 DSP Ethernet Media Access Controller (EMAX)

Management Data Input Output Module Reference Guide.” (spru628a) [5] Zhe Wei and Canhui Cai  “Realization and Optimization of DSP Based H.264 Encoder “, ISCAS

2006 Circuits and Systems, May 2006 [6] Chen, Y., Li, E., Zhou, X., Ge, S. “Implementation of H. 264 Encoder and Decoder on Personal

Computers.” Journal of Visual Communications and Image Representation 17 (2006) [7] Zhuo Zhao, and Ping Liang, “Data partition for wave-front parallelization of H.264 video encoder”,

31st IEEE International Conference on Acoustics, Speech, and Signal Processing (2006) [8] Denolf, K. De Vleeschouwer, et al,, “Memory centric design of an MPEG-4 video encoder” , IEEE

Trans. CSVT, Vol. 15, No. 5, pp. 609-619, May 2005. [9] Tsu-Ming Liu et al., “A 125μW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile

Applications,” ISSCC Digest of Technical Papers, pp. 402-403, Feb. 2006. [10]S. Wenger, “H.264/AVC over IP,” IEEE Trans. Cir. Syst. Video Technol., vol. 13, pp. 645–656,

July 2003. [11] "Non-normative error concealment algorithms , ITU-T VCEG-N62[S】, 2001 一 O9

36

Page 37: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

H.264 Partitions

Frame partitions Macroblock partitions

161

32

16 8

8 0

4

4

16

16

16x16 blocks 8x8 blocks 4x4 blocks

37

Page 38: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

H.264 Intra-Mode Decision

38

Page 39: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

H.264 Intra-Mode Decision

39

16*16 plane

4*4 horizontal

Page 40: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

23/4/21 40

Fast integer & fractional pixel motion estimation

Integer pixel search scheme

-15 -10 -5 0 5 10 15-15

-10

-5

0

5

10

15

Cover both small motion and large motions, the search point which gives the smallest matching error from one step is the starting point of next step.

Around 130 points searched in this algorithm, the save is (33x33-130)/(33x33) 90%!If there are 3 starting points are tried, the save is around 64%!

Assume the guessed starting point is (0,0).

step2-1

Step 2-1. local full-search around the starting point

step2-2

Step 2-2. Uneven multi-hexagon search

step3-1

Step 3-1. Extended Hexogon-based searchThe search will continue until the minimal matching error point is the center of the new hexagon.

step3-2

Step 3-2. Center biased search.

step1

Step 1. Unsymmetrical-cross search

Page 41: NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安  2008.02.13 1.

NCTU, EE, Vision Lab

23/4/21 41

Fast integer & fractional pixel motion estimation

Fractional pixel search scheme

Best matching integer point coming from integer motion search

1. Search its 1/2 -pixel neighbors2. Search its 1/4-pixel neighbors3. Search its 1/8-pixel neighbors

The optimal point is the search center ofnext step search.