Lecture 2 High-Speed I/O - Stanford University · –T =d L/ ν, bits arrive at a ... M Horowitz...
Transcript of Lecture 2 High-Speed I/O - Stanford University · –T =d L/ ν, bits arrive at a ... M Horowitz...
EE371 Lecture 2M Horowitz 1
Lecture 2
High-Speed I/O
Mark HorowitzComputer Systems Laboratory
Stanford [email protected]
Copyright © 2007 by Mark Horowitz, with material from Stefanos Sidiropoulos, and Vladimir Stojanovic
EE371 Lecture 2M Horowitz 2
Readings
• Readings– Techniques for High-speed Implementation of Nonlinear
Cancellation, Sanjay Kasturia and Jack H. Winters
• Overview:– Your project will be the design of a circuit that processes the
input data from a high-speed I/O. This processing is generally done in a mixed signal manner today, but your job will be to build a digital implementation of the algorithm. This lecture will try to give you some background about why I/O rates are important, and what issues need to be resolved to achieve high performance. The next lecture will discuss the operation of the circuit you need to build.
EE371 Lecture 2M Horowitz 3
Computers Today
CPU
Memory
I/OController
GraphicsControllerDisplay
Storage Network
I/O
>1GB/s>1GB/s >4GB/s>4GB/s >4GB/s>4GB/s
>0.1GB/s>0.1GB/s
DDR, RDRAMDDR, RDRAMFBDIMMFBDIMM
FSB, HTFSB, HTAGP, PCIAGP, PCI--EE
PCIPCI--XXPCIPCI--EE
PCI*, *ATA, USB ..PCI*, *ATA, USB ..
DVI, HDMIDVI, HDMI
SystemController
EE371 Lecture 2M Horowitz 4
Speed of Light:The Difference Between I/O and On-Chip Wires
• First question:– Why is I/O different from on-chip wires?
• Both send signals to each other• Gates send data to each other all the time
– Don’t generally worry about signals, or delay– Model the connection between gates as a capacitor
• Sometimes a capacitor/resistor network• Answer:
– On-chip, ignore the speed of light, assume “c” infinite• For external wires can’t make that assumption
– Wire connecting the pins is not an equipotential
– References are different
EE371 Lecture 2M Horowitz 5
Finite Speed of Light Ramifications
• Signals must have delay in reaching destination
– Td = L/ν, bits arrive at a different time than when sent– Thus must determine ‘right’ time to sample them
• Wires store energy
– Current is set by the geometry of wire (what else?)• Signal can’t see termination resistor (causality)
• V/I for the line is called the impedance, Z < 300 Ω
– When signal is traveling on the wire• Power goes into the wire before it hits load
• Since energy is conserved, wire must be storing energy
• Signal is ALWAYS a pair of currents
EE371 Lecture 2M Horowitz 6
Link Issues
• Signaling: getting the bit to the receiver
• Timing: Determining which bit is which
RxTx
RTERM
Channel
RTERM
tbit /2
1 0 0 01 01
EE371 Lecture 2M Horowitz 7
Transmission Lines
• Wire where you notice ‘c’ is finite– Current flows in one terminal– And flows out the other
• Energy is stored in E and B fields– But can model with L, C
Figure from John Poulton
EE371 Lecture 2M Horowitz 8
Problems : Material Loss
• PCB Loss : skin & dielectric loss– Skin Loss ∝ √f– Dielectric loss ∝ f : a bigger issue at higher f
H(s)(transferfunction)
Frequency
Loss in GETEK : 1m, 8mil μstrip trace
EE371 Lecture 2M Horowitz 9
Dealing With Current Return/References
• Wire Utilization:
– Single Endedshared signal return path
– Differentialexplicit signal return path
– “Pseudo” Differential
+-
+-
ref
EE371 Lecture 2M Horowitz 10
Transmission Lines
Two constraints govern behavior at any junction:• Voltage are equal
– They are electrically connected• Power is conserved
– Energy flow into junction is equal to transmitted and reflected
Z1 Z2
Z2 Z1–Z1 Z2+--------------------
2Z2Z1 Z2+--------------------
EE371 Lecture 2M Horowitz 11
High-Speed Wires Are Point to Point
• Can’t split a wire to go to two location– You will get a reflection from the junction– Z1 will see impedance discontinuity
Z1
Z2Z2
EE371 Lecture 2M Horowitz 12
At High Speeds, Vias are Stubs
• Signal energy splits at via– If via is short can be modeled as a cap load– Causes a reflection in signal
• Higher the frequency, the more sensitive you are to stubs
Top layer signaling results in large via stub
EE371 Lecture 2M Horowitz 13
Backplane Environment
• Line attenuation• Reflections from stubs (vias)
Back plane connector
Line card trace
Package
On-chip parasitic(termination resistance and device loading capacitance)
Line card via
Back plane trace
Backplane via
Package via
Back plane connector
Line card trace
Package
On-chip parasitic(termination resistance and device loading capacitance)
Line card via
Back plane trace
Backplane via
Package via
EE371 Lecture 2M Horowitz 14
0 2 4 6 8 10
-60
-50
-40
-30
-20
-10
0
frequency [GHz]
Atte
nuat
ion
[dB
]
9" FR4, via stub
26" FR4,via stub
26" FR4
9" FR4
Backplane Channel
• Loss is variable– Same backplane– Different lengths– Different stubs
• Top vs. Bot
• Attenuation is large– >30dB @ 3GHz– But is that bad?
EE371 Lecture 2M Horowitz 15
Inter-Symbol Interference (ISI)
• Channel is low pass– Our nice short pulse gets spread out
0 1 2 3
0
0.2
0.4
0.6
0.8
1
ns
puls
e re
spon
se
Tsymbol=160ps
• Dispersion – short latency (skin-effect, dielectric loss)
• Reflections – long latency (impedance mismatches –connectors, via stubs, device parasitics, package)
EE371 Lecture 2M Horowitz 16
ISI
0 2 4 6 8 10 12 14 16 180
0.2
0.4
0.6
0.8
1
Symbol time
Am
plitu
deError!
• Middle sample is corrupted by – 0.2 trailing ISI (from the previous symbol), – 0.1 leading ISI (from the next symbol) resulting in 0.3 total ISI
• As a result middle symbol is detected in error
EE371 Lecture 2M Horowitz 17
Equalization For Loss :Goal is to Flatten Response
• Channel is band-limited• Equalization : boost high-frequencies; or attenuate low freq
+
=
EE371 Lecture 2M Horowitz 18
Equalization Mechanisms
0 2 4 6 8 10 12 14 16 180
0.2
0.4
0.6
0.8
1
Symbol time
Am
plitu
de
No equalization
0 2 4 6 8 10 12 14 16 18-0.4
-0.2
0
0.2
0.4
0.6
Symbol time
Am
plitu
de
Tx equalization
• Tx equalization– Pre-filter the pulse with the inverse of the channel– Filters the low freq. to match attenuation of high freq.
• Rx feedback equalization– Subtract the error from the signal
EE371 Lecture 2M Horowitz 19
Removing ISI
• Transmit and Receive Equalization – Changes signal to correct for ISI– Initial work was at transmitter
Linear transmit equalizer
Decision-feedback equalizer
SampledData
Deadband Feedback taps
Tap SelLogic
TxData
Causaltaps
Anticausal taps
Channel
J. Zerbe et al, "Design, Equalization and Clock Recovery for a 2.5-10Gb/s 2-PAM/4-PAM Backplane Transceiver Cell," IEEE Journal Solid-State Circuits, Dec. 2003.
0eqI
doutNoutP
d
Ω50Ω50
EE371 Lecture 2M Horowitz 20
Transmit Equalization – Headroom Constraint
• Transmit DAC has limited voltage headroom• Unknown target signal levels
– Harder to make adaptive equalization work• Need to tune the equalizer and receive comparator levels
– If you have multi-level signals
0 0.5 1 1.5 2 2.5-25
-20
-15
-10
-5
0
frequency [GHz]
Atte
nuat
ion
[dB
]
equalized
unequalized
Amplitude of equalized signaldepends on the channel
TxData
Causaltaps
Anticausal taps
Channel
Peak power constraint
EE371 Lecture 2M Horowitz 21
Removing Interference at Receiver
• Could also build a linear filter– Could have gain in the filter– But either it would need to be analog and have gain – Or need high-speed A/D
• And real multiplication• Sum (ai*xi)
– Increases channel noise too
EE371 Lecture 2M Horowitz 22
High Frequency Channel Noise: Crosstalk
• Many sources– On-chip– Package– PCB traces– Inside connector
• Differential signaling can help– Minimize xtalk generation & make effects common-mode
• Both NEXT & FEXT– NEXT very destructive if RX and TX pairs are adjacent
• Full swing-TX coupling into attenuated RX signal• Effect on SNR is multiplied by signal loss
– Simple solution : group RX/TX pairs in connector– NEXT typically 3-6%, FEXT typically 1-3%
EE371 Lecture 2M Horowitz 23
0 2 4 6 8 10 12 14 16 180
0.2
0.4
0.6
0.8
1
Symbol time
Am
plitu
de
Feedbackequalization
Subtract Out Residual Interference
• Called Decision feedback equalization (DFE)– Subtracts error from input – No attenuation
• Problem with DFE– Need to know interfering bits– ISI must be causal
• Problem - latency in the decision circuit• Receive latency + DAC settling < bit time
– Can increase allowable time by loop unrolling• Receive next bit before the previous is resolved
EE371 Lecture 2M Horowitz 24
Removing ISI
• Transmit and Receive Equalization – Changes signal to correct for ISI– Initial work was at transmitter
Linear transmit equalizer
Decision-feedback equalizer
SampledData
Deadband Feedback taps
Tap SelLogic
TxData
Causaltaps
Anticausal taps
Channel
J. Zerbe et al, "Design, Equalization and Clock Recovery for a 2.5-10Gb/s 2-PAM/4-PAM Backplane Transceiver Cell," IEEE Journal Solid-State Circuits, Dec. 2003.
0eqI
doutNoutP
d
Ω50Ω50
EE371 Lecture 2M Horowitz 25
One Bit Loop Unrolling (for 2 level signal)
• Instead of subtracting the error– Move the slicer level to include the interference– Slice for each possible level, since previous value unknown
K.K. Parhi, "High-Speed architectures for algorithms with quantizer loops," IEEE International Symposium on Circuits and Systems, May 1990
D Q1−nd
dClk
1| 1 =−nn dd
0| 1 =−nn ddα−
dClk
α+
nx
1+
1−
α+1
α−1
α+−1
α−−1
α+
α−0
2PAM signalconstellation
1 Dα+1
α+1
α+
α−1
α−
α+−1
α−−1
EE371 Lecture 2M Horowitz 26
More Bits/Hz
• Multi-level signaling (aka PAM)– Convert extra voltage margin to more bits
– Works well when the noise is small• Need even more signal processing
EE371 Lecture 2M Horowitz 27
Internal Speed Limitation
• Links need good quality clocks with low jitter– That means you want them to settle to both Vdd, and Gnd– If you make the clock to fast, it will not “rail”– And that means it will be prone to jitter
• So one limitation for links is internal clock rate– For power efficiency want FO on clock to be around 4– Need pulse width 3-4 times the slowest gate– Gives around 8 FO4 clock
• For higher speed bit rates– Need to generate multiple bits/clock– Use non-static CMOS clock circuits (CML & inductors)
EE371 Lecture 2M Horowitz 28
Simple Demultiplexing Receiver
• 2-1 demux at the input• Preconditioning stage: filter/integrate, can be clocked to avoid ISI
– Reject CM– Sometimes not used
• Latch makes decision (4-FO4)
Data_E
Data_O
Input
inref
clk
pre latch
clk
EE371 Lecture 2M Horowitz 29
Simple Multiplexing Transmitter
• DDR: send a bit per clock edge• Critical issues:
– 50% duty cycle– Tbit > 4-FO4
Data_O
Data_E1 2 3 4 50
10
20
30
bit time (normalized to FO4)
outp
ut p
ulse
wid
th c
losu
re (
%)
EE371 Lecture 2M Horowitz 30
I/O Clocking Issues
• Remember the clocking issues:– Long path constraint (setup time)– Short path constraint (hold time)– Need to worry about them for I/O as well
• For I/O need to worry about a number of delays– Clock skew between chips– Data delay between chips
• Can be larger than a clock cycle (speed of light)– Clock skew between external clock and internal clock
• This can be very large if not compensated• It is essentially the insertion delay of the clock tree
EE371 Lecture 2M Horowitz 31
System Clocking: Simple Synchronous Systems
• Long bit times compared to on chip delays:– Rely on buffer delays to achieve adequate timing margin
CKX
CKC2
DI
on-chip logic
CKC1
CKX
DI
CKC1
CKC2
d1
d2
EE371 Lecture 2M Horowitz 32
PLLs: Creating Zero Delay Buffers
• On-chip clock might be a multiple of system clock:– Synthesize on-chip clock frequency
• On-chip buffer delays do not match– Cancel clock buffer delay
PLL/DLLCKX
CKC
DI
CKX
DI
CKC
on-chip logic
EE371 Lecture 2M Horowitz 33
Used to Argue About PLLs vs DLLs
• Second/third order loop:Stability is an issueFrequency synthesis easyRef. Clk jitter gets filteredPhase error accumulates
• First order loop:Stability guaranteedFrequency synthesis problematicRef. Clk jitter propagatesPhase error does not accumulate
÷NPD
refclk
VCO
Filter
PDrefclk
VCDL
Filter
clk
clk
EE371 Lecture 2M Horowitz 34
After Many Years of Research
• And many papers and products
• One can mess up either a DLL or PLL
– Each has it own strengths and weaknesses
• If designed correctly, either will work well
– Jitter will be dominated by other sources
• Many good designs have been published
– It is now a building block that is often reused
– We all have our favorites, mine is the dual-loop design
• And yes, people use ring oscillators
– Still an open question about how much LC helps (in system)
EE371 Lecture 2M Horowitz 35
Clocking Structures
• Synchronous:Same frequency and phase• Conventional buses
• MesochronousSame frequency, unknown phase
• Fast memories• Internal system interfaces• MAC/Packet interfaces
• Plesiochronous:Almost the same frequency
– Mostly everything else today
t t
F0
tA tB
F0tA≠ tB
F1 F2F1≈ F2
EE371 Lecture 2M Horowitz 36
Source Synchronous Systems
– Position on-chip sampling clock at the optimal pointi.e. maximize “timing” margin
PLL/DLL
ref
data
CKSRC
rcvr
logic
CKRCV
CKSRC
data
CKRCV
D0 D1 D2 D3