Post on 13-Jan-2016
L16 : Logic Level Design (2)
성균관대학교 조 준 동 교수http://vlsicad.skku.ac.kr
Low Power Logic Gate Resynthesis on Mapped Circuit
김현상 조준동 전기전자컴퓨터공학부
성균관대학교
Transition Probability
• Transition Probability: Prob. Of a transition at the output of a gate, given a change at the inputs
• Use signal probabilities
• Example: F = X’Y + XY’– Signal Prob. Of F: Pf = Px(1-Py)+(1-Px)Py
– Transistion Prob. Of F = 2Pf(1-Pf)
– Assumption of independence of inputs
• Use BDDs to compute these
• References: Najm’91
Technology Mapping •Implementing a Boolean network in terms of gates from a given library•Popular technique: Tree-based mapping•Library gates and circuits decomposed into canonical patterns•Pattern matching and dynamic programming to find the best cover•NP-complete for general DAG circuits•Ref: Keutzer’87, Rudell’89•Idea: High transition probability points are hidden within gates
Low Power Cell Mapping
• Example of High Switching Activity Node
• Internal Mapping in Complex Gate
A
Q
D
C
BY
A
Y
D
C
B
Signal Probability vs. Power
0.5 1.00.0signal probability : p(x )
pow
er :
P(x
)
p(x)
(1-p
(x))
p(x) < 0.5 p(x) > 0.5
Spatial Correlation
P(x) = 0.25
P(x) = 0.25P(z) = 0.4375
a
b
c
P(b) = 0.5
P(c) = 0.5
P(d) = 0.5
P(x) = 0.25
P(y) = 0.25
x
y
zP(z) = 0.375
y
xz
Logic Synthesis for Low Power• Precomputation logic
– selectively precompute the output logic values
-> reduce switching activity– using predictor function
• Retiming– re-positioning the F/F in a
pipelined circuit– candidates for adding
• circuit nodes with high hazard activity
• circuit nodes with high load capacitance
AR1
R2R3
g1g2
NOR
)1(y 1
)1(y 1
2
1
Rfg
Rfg
gCL
gCL
R
y
Logic Synthesis for Low Power• State assignment
– to minimize the switching activity on high state transition arc
– can also consider the complexity of the combinational logic
– experimental result
• 10% ~17% power reductions
• Path balancing
– reduce hazards/glitches
– key issue in the delay insertion
• to use the minimum number of delay to achieve the maximum reduction
• Multi-level network optimization– use network don’t care term
– cost function• minimize sum of the number
of product terms and the weighted switching activity
• how changes in the global function of an internal node affects the switching activity of in its transitive fanout
– experimental result• ~10% power reduction
Logic Synthesis for Low Power
• Technology decomposition– minimizes the sum of the switching
activities at the internal nodes– one method
• to inject high switching activity inputs into the tree as late as possible
• Technology mapping
– general principle
• hide nodes with high switching activity inside the gates
abc
d
P(a) = 0.3P(b) = 0.4P(c) = 0.7P(d) = 0.5
ab
cd
E(sw) = p(ab)+p(abc)+p(abcd) = 0.246
abab
E(sw) = p(ab)+p(cd)+p(abcd) = 0.512
H
H
LH : high transition nodeL : low transition node
Low Power Logic Synthesis
Technology IndependentOptimization
Technology Mapping
Resynthesis on MappedCircuit
Logic Equation
Connection of Gates
RTL Description
Gate Level Description
Logic Synthesis
Timing & PowerAnalysis Tools
Technology Mapping
(a)
l
l
(c)
h : high switching activity node
l : low switching activity node
h
h
l
l
(b)
h
h
l
l
Tree Decomposition
(a) (b)
Low Power
ff
gate(AND)
primary input
critical path
f output
Huffman Algorithm
x 1 x 2 x 3 x 4
y 1 y 2
x 5
y 3
2 3 4 4
5 8
13 10
23
Depth-Constrained Decomposition• Algorithm• problem : minimize SUM from i=1 to m p_t (x_i ) • input : 입력 시그널 확률 (p1, p2,íñíñíñ, pn), 높이 (h), 말단 노드의 수 (n), 게이트당 fanin l
imit(k)• output : k-ary 트리 topology• Begin• sort (signal probability of p1, p2,íñíñíñ, pn);• while (n!=0) • if (h>logkn)• assign k nodes to level L(=h+1);• /* 레벨 L(=h+1) 에 노드 k 개만큼 할당 */ • h=h-1, n=n-(k-1); /*upward*/• else if (h<logkn)• assign k nodes to level L(=h+2); • /* 이전 레벨 L(=h+2) 에 노드 k 개만큼 할당 */• h=h, n=n-(k-1); /*downward*/• else (h=logkn)• assign the remaining nodes to level L(=h+1); • /*complete; 레벨 L(=h+1) 에 나머지 노드를 모두 할당하고 • complete k-ary 트리 구성 */
• for (bottom level L; L>1; L--) • min_edge_weight_matching (nodes in level L);• End
Exampleh = 1
a
x
b a
x
b c
y
d a
x
b c
y
d
e f
0.1 0.2 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4
0.5 0.6
h = 2
h = 3
level L =0
level L =1
level L =2
level L =3
a
x
b c
y
d
e f
0.1 0.2 0.3 0.4
0.5 0.6
a
x
d b
y
c
e f
0.1 0.4 0.2 0.3
0.5 0.6
before matching after matching
After Decomposition
0
2
4
6
8
10
12
14
16
Valu
e,
Ratio
h=36
h=410
h=6 h=5 h=7 h=520
h=7 h=9
Fanin, Height
K 1=2
SIS
SIS+OURS
Improvement Ratio
After Tech. Mapping
0
10
20
30
40
50
60
70
80
Pow
er(
mW
), R
atio
h=26
h=3 h=310
h=4 h=5 h=315
h=4 h=5 h=520
h=6 h=7 h=8
Fanin, Height
K 1=3, k 2=3
SIS+LEVEL MAP
SIS+OURS+LEVEL MAP
Improvement Ratio
Precomputation• Power saving
– Reduces power dissipation of combinational logic– Reduces internal power to precomputed registers
• Opportunity– Can be significant, dependent on;
• percentage of time latch precomputation is successful
• Cost– Increase area– Impact circuit timing– Increase design complexity
• number of bits to precompute– Testability
• may generate redundant logic
Precomputation
R egisterB ank
/ /
Data_out
pn/R egisterB ank
n/ p
E N
R egisterB ank
/Data_out
pn- m
/
m R egisterB ank
R egisterB ank
D Q
/
/
/
/
/
/
/
/
n - m
m
p p
1
p
Entire function is computed.
Smaller function is defined,
Enable is precomputed.
• Before Precomputation Diagram
Precomputation
a > b/
Data_out
C LK
a /
/8
b
8
/1
1
/
/
8
8
• After Precomputation Diagram
Precomputation
a(6:0)
a > b/
Data_out
Latch
C LK
/
a(6: 0)
/
b(7)
a(7)
b(6:0)
/
/7
b(6: 0)
a(7)
b(7) /
1
1
/
7
7
7
/8
/8
/1
1
/1
/1
• Before Precomputation - ReportPrecomputation
• After Precomputation - ReportPrecomputation
Precomputation Example - Before Code
Library IEEE;Use IEEE.STD_LOGIC_1164.ALL;Entity before_precomputation isport ( a,b : in std_logic_vector(7 downto
0);CLK: in std_logic; D_out: out std_logic);
end before_precomputation;
Architecture Behav of before_precomputation is
signal a_in, b_in : std_logic_vector(7 downto 0);
signal comp : std_logic;
Beginprocess (a,b,CLK)
Beginif (CLK = '1' and CLK'even
t) then a_in <= a;
b_in<= b;end if;if (a_in > b_in) then
comp <= '1';else comp <= '0';end if;if (CLK'event and CLK='1')
then D_out <= comp;
end if;end process;end Behav;
Precomputation Example - After Code
Library IEEE;Use IEEE.STD_LOGIC_1164.ALL;
Entity after_precomputation isport (a, b : in std_logic_vector(7 downto 0);
CLK: in std_logic; D_out: out std_logic);end after_precomputation;
Architecture Behav of after_precomputation is
signal a_in, b_in : std_logic_vector(7 downto 0);
signal pcom, pcom_D : std_logic; signal CLK_en, comp : std_logic;
Beginprocess(a,b,CLK)Begin
if (CLK='1' and CLK'event) thena_in(7) <= a(7);b_in(7) <= b(7);
end if;
pcom <= a xor b;
if (CLK='0') thenpcom_D <= pcom;
end if;
CLK_en <= pcom_D and CLK;
Precomputation - Example After Code
if (CLK_en='1' and CLK_en'event) then
a_in(6 downto 0) <= a(6 downto 0);
b_in(6 downto 0) <= b(6 downto 0);end if;
if (a_in > b_in) thencomp <= '1';
else comp <= '0';
end if;
if (CLK='1' and CLK'event) thenD_out <= comp;
end if;end process;end Behav;