DSP Introduction - 國立臺灣大學access.ee.ntu.edu.tw/course/DSP_Lab/slides/w2 2004-09-22...
Transcript of DSP Introduction - 國立臺灣大學access.ee.ntu.edu.tw/course/DSP_Lab/slides/w2 2004-09-22...
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
DSP IntroductionDSP Introduction
Instructor: Prof. An-Yeu Wu2004/September
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P2
OutlineOutlineDigital Signal Processing OverviewApplicationsMarket ObservationDSP Processor IntroductionFundamentals of Digital Signal ProcessorRecent DSP Relative Topics
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P3
Digital Signal Processing Digital Signal Processing OverviewOverview
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P4
What is Signal Processing?What is Signal Processing?
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P5
SignalsSignals
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P6
SignalsSignalsSignal is: (Webster’s Dictionary)1 : SIGN, INDICATION2a : an act, event, or watchword that has been agreed on as
the occasion of concerted action b : something that incites to action
Signal can be characterized in several ways:Continuous time or Discrete timeContinuous valued or Discrete values1-D signals or 2-D signals (different dimension)Real valued or Complex valuedScalar or VectorDeterministic or Random
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P7
Characterize SignalsCharacterize Signals
Continuous time & continuous valued:
Analog signal
Discrete time & continuous valued:
Sampled signal
Continuous time & discrete valued:
Quantized signal
Discrete time & discrete valued:
Digital signal
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P8
Characterize SignalsCharacterize SignalsDifferent dimensional signals:
Speech vs. Image vs. VideoReal value & Complex value signals:
Residential electrical power vs. Industrial reactive power
Scalar & Vector signals:Sea Surface Temperature vs. North Atlantic Current
Deterministic & Random signals:Speech vs. Background noise
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P9
ProcessingProcessingProcess is: (Webster’s Dictionary)2 b (1) : to subject to or handle through an established usually
routine set of procedures(2) : to subject to examination or analysis
Processing is application-oriented:Communication: Modulation, DemodulationSignal enhancement: Filtering, Equalization…Spectral analysis: Transform…Image processing: Reconstruction, Watermarking...Data compression: Transform, Quantization…Security: Encryption, Decryption
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P10
Real World Signal ProcessingReal World Signal ProcessingReal world signals:
Most signals are analog and continuous.e.g.. sound, vision, pressure, radiation...
Processing real world signal in tradition:Modeling
Higher complexity: nonlinear, time-variant systems
Environment-sensitiveTemperature, Pressure, Gravity…
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P11
What is Digital Signal Processing?What is Digital Signal Processing?Digital Signal Processing:
Digital signal processing is to process real world signals (represented discrete and quantized or naturally digital) using mathematical techniques or algorithmic manipulationto perform transformations or extract information.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P12
Digital Signal ProcessingDigital Signal ProcessingSignals in DSP system are sequences of quantized samples(discrete both in time and value).
Signals are obtained from physical signals via transducers (e.g., microphones) and than become electric signals (e.g. voltage).
Electric signals are converted to digital signal by sampling and quantizing of analog-to-digital converters (ADC).
Digital signals may be recorded or converted into analog signals(e.g., voltage) through digital-to-analog converters (DAC).
Transducers (e.g., speaker) convert electrical signal back into physical signals.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P13
Sample and QuantizeSample and Quantize⎥⎦⎥
⎢⎣⎢⋅= εε yyQ )(Sampling interval: T Quantize
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P14
Example Example Communication system example: Cellular phone
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P15
Why Digital Signal Processing?Why Digital Signal Processing?“Exactness”
Perfect reproduction without error and perfect duplication of processing resultAccuracy in digital signal representations can be controlled better by changing word-length of the signal.
“Robustness”Digital signals can be stored and recovered, transmitted and received, processed and manipulated, all virtually without error.
“Convenient”Complicated or sophisticated DSP techniques can be easily applied to target signal.Faster system design, and verification in every development cycles.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P16
ApplicationsApplications
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P17
Common DSP Algorithms & ApplicationsCommon DSP Algorithms & Applications
Applications – Instrumentation and measurement – Communications – Audio and video processing – Graphics, image enhancement, rendering – Navigation, radar, GPS – Control - robotics, machine vision, guidance
Algorithms – Frequency domain filtering – FIR, IIR – Frequency- time transformations – FFT, DCT– Correlation
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P18
Image CompressionImage Compression
JPEG Encoding
JPEG Decoding
Spatial domain Frequency domain
Quantize--Dequantize
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P19
Voice RecognitionVoice Recognition
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P20
Audio ApplicationAudio Application
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P21
Market ObservationMarket Observation
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P22
Semiconductor MarketSemiconductor MarketSingle processors (MPUs) and DRAMs were driving semiconductor industry because of personal computing market.Now DSP has become one major technology driver.
Increasing need to digital processing signals in embeded systeme.g. Communication application, multimedia application
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P23
DSP MarketDSP Market$2Billion market*, 30% growth rate
*1996
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P24
Wireless TrendWireless Trend
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P25
ExampleExampleThe prevalence of cellular phone in Taiwan reached 110% in 2004.Incredible growing of prevalence in China and Russia. (millions of mobile phone per month)Cellular phone is a product with fast retired and replaced generations.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P26
TodayToday’’s DSP Market Splits DSP Market Split
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P27
DSP Processor IntroductionDSP Processor Introduction
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P28
Review: Processor ClassesReview: Processor ClassesGeneral Purpose - high performance
– Pentiums, Alpha's, SPARC– Used for general purpose software – Heavy weight OS - UNIX, NT – Workstations, PC's
Embedded processors and processor cores– ARM, 486SX, Hitachi SH7000, NEC V800 – Single program – Lightweight OS – eCos, uLinux, …– Need DSP processor support in such oriented application– Cellular phones, consumer electronics (e. g. CD players)
Microcontrollers – Extremely cost sensitive– Single program, OS is usually needless– Small word size - 8 bit common– Automobiles, toasters, thermostats, ...
Incr
easi
ng C
ost
Incr
easi
ng V
olum
e
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P29
ComparisonComparison
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P30
RealizationRealizationDigital Signal Processing algorithms can be realized through these technology:
Digital Signal Processor (DSP)ADI Blackfin processor, TI TMS320CX processor…
General Purpose MicroprocessorPentium CPU, ARM
Application Specific Integrated Circuit (ASIC)FFT processor, Equalizer
Field-Programmable Gate Array (FPGA)
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P31
Digital Signal ProcessorDigital Signal ProcessorA digital signal processor (DSP) is a type of microprocessor.
Processing data in real time. The real-time capability makes a DSP perfect for applications that cannot tolerate any delays. Essentially infinite stream of data need to be processed.Large amount of I/O with analog interface
ADC, DAC
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P32
DSP featuresDSP featuresSingle-cycle multiply-accumulate operations Real-time performance, simulation and emulation Flexibility ReliabilityReduced system costReduced development cycleEasy to modify DSP algorithm or update system by software reprogramming
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P33
ComparisonComparisonThe FPGA Alternative:
Field-Programmable Gate Arrays have the capability of being reconfigurable within a system.Fast time prototyping and development.Offer greater raw performance per specific operation because of the resulting dedicated logic circuit. FPGAs are significantly more expensive and typically have much higher power dissipation than DSPs with similar functionality. When FPGAs are the chosen performance technology in designs, DSPs are typically used in conjunction with FPGAs to provide greater flexibility, better price/performance ratios,and lower system power.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P34
ComparisonComparisonThe ASIC Alternative
Application-specific ICs provide extreme efficiency, both power consumption and processing power.Functionality of ASICs cannot be iteratively changed or updated like FPGA or Programmable DSP while in product development. Usually choosed by extremely performance-sensitive cases
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P35
ComparisonComparisonThe General Purpose Processor (GPP) Alternative
In contrast to ASICs that are optimized for specific functions, general-purpose microprocessors (GPPs) are best suited for performing a broad array of tasks.High performace GPPs are usually too expensive for many DSP applications. Such as CPU in our desktop.Low cost GPPs’ comparatively poor real time performance and high power consumption make them rule out in DSP applicatiion.Now in many system GPPs usually play the role of system controller instead of algorithm-computation unit.
Such as Cell phone
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P36
Implement ChoicesImplement Choices
Power for one tap computation
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P37
FixedFixed--Point & Floating Point DSPsPoint & Floating Point DSPsProgrammable DSPs come in 2 flavors, fixed and floating point.Floating point DSP:
Expensive, longer instruction cycleLarge signal dynamic rangeAdopted in very presicion-sensitive case
Communication infrastructureMedical image systemMilitary weapons
Fixed point DSP:Cheaper, shorter instruction cycleLess signal dynamic range: constrained by wordlengthOverflow possibility
Multimedia
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P38
Market Separation: ADI ExampleMarket Separation: ADI Example
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P39
Fundamentals of DSP Fundamentals of DSP ProcessorProcessor
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P40
VonVon--Neumann MachineNeumann MachineSingle memory space for program and dataShared global bus
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P41
Motivation: FIR FilteringMotivation: FIR FilteringM most recent samples in the delay line : x(i)New sample moves data down delay line“Tap” is a multiply-addEach tap (M+1 taps total) nominally requires:
Two data fetches Multiply Accumulate Memory write-back to update delay line
Goal: 1 FIR Tap / DSP instruction cycle
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P42
FIR ImplementFIR Implement
∑−=
=
−=1
0)()()(
Ni
iinxicny
On Von-Neumann machine, the expressions are executed row by row.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P43
FIR on VonFIR on Von--Neumann MachineNeumann MachineBus/Memory bandwidth is bottleneckControl code overhead11 instructions per tap
loop: lw x0, 0(r0) lw y0, 0(r1) mul a, x0,y0add y0,a,b sw y0,(r2) inc r0 inc r1 inc r2 dec ctr tst ctrjnz loop
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P44
FIR on Modified VonFIR on Modified Von--Neumann MachineNeumann Machine
Assume such Von-Neumann machine has:Multiply and Accumulate (MAC) instructionPipelining, that makes MAC instruction and Read/Write instruction execute in parallel
Then each tap of FIR still needs 4 cycle:1. Read MAC instruction2. Read data value x from memory3. Read coefficient c from memory4. Write data value to next location in the delay line
MAC time is one of the most basics statistics for comparing the performance of programmable DSPs.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P45
Basic Harvard ArchitectureBasic Harvard ArchitectureSeparate program and data memory spacesUsually refer to separate program and data buses
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P46
ExampleExampleFirst generation DSP: Texas Instrument TMS320C10 in 1982:Harvard architecture16-bits fix-pointAccumulator-based390ns MAC time (160ns today)Load-Accumulateinstruction
T-Register
Accumulator
ALU
Multiplier
Datapath:
P-Register
Mem
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P47
X4 and H4 are direct (absolute) memory addresses: LT X4 ; Load T with x(n-4) MPY H4 ; P = H4*X4 LTD X3 ; Load T with x(n-3); x(n-4) = x(n-3);
; Acc = Acc + P MPY H3 ; P = H3*X3 LTD X2 MPY H2 ...
About 2 instructions per tap, but requires unrolling
FIR on TMS320C10FIR on TMS320C10
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P48
Modified Harvard ArchitectureModified Harvard ArchitectureProgram bus can be use for coefficient loading for MAC
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P49
ExampleExampleModified Harvard architecture is applied on TMS320C25 in1987Simultaneous acquisition for instruction & 2 operandsSingle cycle MACSimultaneous ALU operation and Multiplier operation100ns instruction cycle time
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P50
FIR on TMS320C25FIR on TMS320C25
MACD = Multiply by Program MEM and Accumulate with delay
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P51
More about Harvard ArchitecturesMore about Harvard ArchitecturesHarvard architecture has many modified version:Basic: separated program and data spaceMod.1: program space contain read only dataMod.2: use multi-port memory for data spaceMod.3: add program cache to enhance throughput of shared
program/data memory block.Mod.4: use 2 separated data memory for simultaneous
instruction/operands fetchMod.5: use 4 separated data memory and add an I/O specific memory
Programmer can ignore Harvard architecture until it becomes necessary to optimize the code.While optimizing with multiple memories, programmer must carefully arrange data in memory to take advantage of the multiple memories.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P52
Block DiagramBlock Diagram
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P53
Features of Most DSP ProcessorsFeatures of Most DSP Processors
Data path configured for DSP Specialized instruction set Multiple memory banks and buses Specialized addressing modes Specialized execution control Specialized IO for peripherals
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P54
Data PathData PathDSPs dealing with numbers representing real world
Real number, Fractions, …DSPs dealing with numbers for addresses
IntegersSupport fixed point number as well as integers
S.radix point
-1 Š x < 1
S .radix point
–2N–1 Š x < 2N–1
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P55
Fixed Point ArithmeticFixed Point ArithmeticPrecision must be carefully conserved.
Precision lost through quantization errors arising from A/D, D/A conversion and multiplication.
Signal-to-quantization-noise ratio increases linearly with signal level.
Each additional bit yields 6dB improvement in SNRDynamic range can be extended by using more bits to present numbers.
Overflow must be prevent.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P56
Overflow in Fixed Point DSPOverflow in Fixed Point DSPDSP are descended from analog :what should happen to output when “peg” an input?
Modulo ArithmeticOverflow detection and prevention:
Saturation arithmetic:Set to most positive (2N–1–1) or most negative value(–2N–1) when overflow detected.
Shifting product:Arithmetic shift right (shift down), with sign extension
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P57
DSP Data Path: MultiplierDSP Data Path: Multiplier
Specialized hardware performs all key arithmetic operations in 1 cycle50% of instructions can involve multiplier=> single cycle latency multiplierDesign to perform multiply-accumulate (MAC) in processing coren-bit multiplier => 2n-bit productOften concatenate with a shifter to prevent overflow
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P58
DSP Data Path: AccumulatorDSP Data Path: AccumulatorDon’t want overflow or have to scale accumulatorOption 1: accumulator wider than product: guard bits
Motorola DSP: 24b x 24b => 48b product, 56b Accumulator
Option 2: shift right and round product before adder
Accumulator
ALU
Multiplier
G
Accumulator
ALU
Multiplier
Shift
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P59
DSP Data Path: RoundingDSP Data Path: RoundingEven with guard bits, will need to round when store accumulator into memory3 DSP standard options
Truncation: chop results=> biases results upRound to nearest: < 1/2 round down, � 1/2 round up (more positive)=> smaller biasConvergent: < 1/2 round down, > 1/2 round up (more positive), = 1/2 round to make lsb a zero (+1 if 1, +0 if 0)=> no biasIEEE 754 calls this round to nearest even
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P60
DSP MemoryDSP MemoryIn contrast with RISC uP, DSP processors usually contain internal memories, not cache.Multi-ported memories/ multiple independent memory banks are difficult/expensive to implement off chip.
Pin count requirement of DSP processor will increase dramatically if implementing multiple memory bank off chip.More expensive packagePhysical memory with multi-port is also much more expensive.
DSP processors mix the strategy.Adopt 1~2 additional external bus for off-chip memory spaceMultiple internal memory banks with software-controlled paging system and external page pool.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P61
ExampleExampleMotorola DSP56001
32-bit word and three memory banks, each with 32-bit address (64 pin for one memory space)192 pins are required to implement 3 parallel memory banks totally external.Multiplexed bus (64 pins) is applied on 56001
Motorola DSP96002Same processor core with DSP56001Bring 1 additional bus outside chip, more 64 pins
It can simutaneous access 2 memory spaces.DSP96002 is a 200 pin version of DSP56001
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P62
DSP Memory (cont.)DSP Memory (cont.)FIR Tap implies multiple memory accessesMost DSP processors have multiple data portsSome DSPs use ad hoc techniques to reduce memory bandwidth demand
Instruction repeat buffer: do 1 instruction 256 timesOften use maskable interrupts, thereby increasing interrupt response time
Some recent DSPs have instruction cachesEven then may allow programmer to “lock in” instructions into cacheOption to turn cache into fast program memory or data memory
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P63
Memory HierarchyMemory HierarchyRegisters
Outof
order
I/DCache
Physicalmemory
TLB
Registers
DMA Controller
I Cache Internalmemories
Externalmemories
TLB: Translation Look aside Buffer
DMA: Direct Memory Access
RISC
DSP
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P64
Memory AddressingMemory AddressingHave standard addressing modes: immediate, displacement, and register indirect addressing.Goal: to keep MAC data path as busy as possibleAssumption: any extra instructions for each tap imply more clock cycles of overhead in inner loop
Complex addressing is a better choicePrevent using data path to calculate address
Auto-increment / Auto-decrement register for indirectaddressing:
lw r1,0(r2)+ => r1 <- M[r2]; r2<-r2+1Option to do it before addressing, positive or negative
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P65
Addressing (cont.)Addressing (cont.)
DSPs dealing with continuous I/O streamI/O buffer for data on delay-lines
Use circular buffer to save memory.Use modulo/circular addressing mode for circular buffer
Also used in sliding window algorithms:
ConvolutionCorrelationFIR filters
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P66
Addressing for FFTAddressing for FFTFFTs start or end with data in reverse order:
0 (000) => 0 (000)1 (001) => 4 (100)2 (010) => 2 (010)3 (011) => 6 (110)4 (100) => 1 (001)5 (101) => 5 (101)6 (110) => 3 (011)7 (111) => 7 (111)
Avoid overhead of address checking instructions or bit-reversing operations for FFT.Have an optional bit reverse addressing mode for use with auto-increment addressing.Many DSPs have bit reverse addressing for radix-2 FFT
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P67
DSP InstructionsDSP Instructions
May specify multiple operations in a single instructionMust support Multiply-Accumulate (MAC)Support parallel move of data in registerUsually have special loop support to reduce branch overhead (such as pipeline stall)
Loop an instruction or sequence by an iteratorNo branch instruction is taken for loopingIn many DSP processor, if iterator=0, usually means looping maximum number of times (infinite looping).
Auto/Manual shift-left arithmetic instruction for saturationConditional execution for branch reduction
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P68
Pipeline in DSP ProcessorPipeline in DSP ProcessorPipelining effectively speeds up the computation, but it can have serious impact on programmability.An instruction is fetched at the same time that operands for a previously fetched instruction are being fetched.Three fundamental techniques are adopted:
InterlockingTime-stationary codingData-stationary coding
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P69
InterlockingInterlockingIf some immediate instruction prompts to access shared resource, the control hardware will delay the execution of the arithmetic operation.Interlocking stalls pipeline and therefore decreasesperformanceProgrammer is not aware of interlocking.Some DSP manufacturer, such as TI, supplies a simulator that gives the detailed timing of any sequence of instructions, including interlocking information, for code optimizing.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P70
TimeTime--Stationary CodingStationary CodingInstruction specifies the operations that occur simultaneously in one instruction cycle.Parallelism rather than pipelininge.g. AT&T DSP16
a0=a0+p p=x*y y=*r0++ x=*pt++Simultaneously update a0( accumulate ), p( multiply ) ,y( operand through pointer dereference ) , x( operand through pointer dereference )
Advantages:Timing of a program is clear.Very fast interrupts: programmers explicitly control over the pipeline, and there is no need to flush the pipe prior to invoking the interrupt.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P71
DataData--Stationary CodingStationary CodingInstructions specify all of the operations performed on a set ofoperands from memory.These instructions specify what happens to data, rather than what happens at a particular time in the hardware.Operations proceed in parallel, specified by neighbor instructions. Data-stationary coding is no less efficient than time-stationary coding.Fast interrupt are more difficult in data-stationary coding than time-stationary coding.e.g. AT&T DSP32r5++ = a1 = a0+ *r7 * *r10 ++r17
Parallel write to memory and location specified by r5
Accumulate with value in memory
Dereference and multiply
Update pointer for operands
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P72
Branch in DSPBranch in DSPSome problems conspire to make it difficult to achieve a efficient branching.
If program address space is large, the destination address may not fit in an instruction word. More fetching from instruction memory may be required. Alternatives are paging and PC-relative addressing.In conditional branching, the fetch of the next instruction cannot occur before the condition codes in the ALU can be tested
SolutionsUse delayed branch: fetch more instructions independent of branch and execute before branch occurs; or separate data arithmetic instructions several cycles prior to the test.Design low-overhead looping instructions for tight inner loop, rather than use branch instructions for loop.Design conditional instructions to institute conditional branches.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P73
Recent DSP Relative TopicsRecent DSP Relative Topics
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P74
VLIW Architectures for DSPVLIW Architectures for DSPWhat is VLIWSuperscalar vs. VLIWCharacteristics of VLIW processorExample of VLIW on DSPAdvantage and Disadvantage of VLIW
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P75
What is VLIW?What is VLIW?Abbreviation of Very Long Instruction WordUntil 1997, most DSP processors are similar
Specialized execution unit and instruction setDifficult to program in assemblyUnfriendly compiler targetsOne instruction per instruction cycle such as multiply-accumulate and store
VLIW architecture use simple, regular instruction setand execute multiple instructions per cycle.Strategy: More parallelism create higher performanceBetter compiler targets
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P76
Superscalar vs. VLIWSuperscalar vs. VLIW
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P77
Characteristics of VLIW ProcessorCharacteristics of VLIW ProcessorMultiple independent instructions per cycle.
Packed into single large instruction word / packetInstructions may be positional or include routing information with in each sub-instruction word
Independent execution unit with complement featureEach instruction packet may be dispatched to several execution unit.
More regular, orthogonal, RISC-like instructionsLarge, uniform register setWide program and data buses
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P78
Example: ADI TigerSHARCExample: ADI TigerSHARCIt’s a static superscalar DSP can execute simultaneously from one to four 32-bit instructions encoded in a single instruction line.Combine VLIW with SIMD (single instruction multiple data)
The programmer has the option of directing both computation blocks to operate on the same data (broadcast distribution) or different data (merged distribution). Each computation block can execute four 16-bit or eight 8-bit SIMD computations in parallel.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P79
Advantages of VLIW ArchitectureAdvantages of VLIW ArchitectureBetter performance
More regular execution unitMore instructions executed in parallel than traditional DSP with time-stationary coding instructions
Better compiler target:Program sequence, to tell independent and dependent instructions.Compile-time specified dispatch rather than specified in silicon
Potentially easier to program for DSPPotentially scalable
Able to add more execution unit in processor core, allow more sub-instructions to be packed into one VLIW instruction
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P80
Disadvantages of VLIW ArchitectureDisadvantages of VLIW Architecture
New type of programmer/compiler complexity:Programmer or code-generation tool must keep tracking of instruction schedulingDeep pipelines and long latencies can be confusing, and may make it hard to reach peak performance.
Increase memory complexityHigher memory bandwidth is required
Higher power consumptionConfusing performance-evaluation: MIPS/MFLOPSrating strategy is mislead.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P81
DSP vs. General Purpose MPUDSP vs. General Purpose MPUThe “MIPS/MFLOPS” of DSPs is speed of Multiply-Accumulate (MAC).
DSP are judged by whether they can keep the multipliers busy 100% of the time.
The "SPEC" of DSPs is 4 algorithms: Infinite Impulse Response (IIR) filtersFinite Impulse Response (FIR) filtersFFT, and Convolution
Algorithm is everything for DSP ProcessorSoftware compatibility is not a concern
Programmers often write in assembly language to minimize requiring for ROM and optimize performance.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P82
SummarySummaryDSP system background knowledge and application overviewDSP market observationDSP processor architecture and its evolutionModern DSP processor architecture overview
Next time:Processor specific topics: Blackfin architecture
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU
P83
ReferenceReference[1] http://www.webster.com/[2] http://www.BDTI.com/[3] Gregory K. Wallace, “The JPEG Still Picture Compression Standard”,
Communications of the ACM, Volume 34, Issue 4 (April 1991),Pages: 30 –44, 1991, ISSN:0001-0782, http://portal.acm.org/citation.cfm?id=103089&coll=portal&dl=ACM&CFID=26765382&CFTOKEN=77630149
[4] “TMS320C1X Digital Signal Processors Datasheet”, http://focus.ti.com/docs/prod/folders/print/tms320c10.html
[5] http://www.ee.ucla.edu/~schaum/ee201a_S02/[6] “Quick Guide to Developing with ADI DSPs - DSP Selection”,
http://www.analog.com/processors/resources/beginnersGuide/quickguide1.html[7] Edward A. Lee,“Programmable DSP Architectures: Part I”, IEEE ASSP
Magazine, p.4~p.19, October 1988[8] Edward A. Lee,“Programmable DSP Architectures: Part II”, IEEE ASSP
Magazine, p.4~p.19, January 1989