Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU...
-
Upload
imogen-rich -
Category
Documents
-
view
214 -
download
0
Transcript of Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU...
![Page 1: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/1.jpg)
Low power design and debug support of embedded multicore
MPSoC ’09
Rev. 1.00 00000-A
CPU Development Dept 1
Osamu Nishii
©2009. Renesas Technology Corp., All rights reserved.
![Page 2: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/2.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.2
Outline of talk
Embedded processor SHLow power designDebug functionMulti-core demoSummary
![Page 3: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/3.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.3
cover
Embedded Processor SH
![Page 4: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/4.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.4
When SH starts …
In 1991,“Micro-controller” area was stable, its processor
was small CISC CPU, and one instruction took multiple cycles,
“RISC” took its position in EWS/server, single cycle execution and initial superscalar architecture were known. Mainly, because of RISC’s off-chip fast RAM memory system, RISC system cost was expensive.
SH started as small RISC, featuring reduced instruction length 16-bit.
![Page 5: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/5.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.5
SH short history
1990 2000 2010
SH-1 SH-2
(Cache) SH-3 (MMU)
SH-4 (FPU)
SH-X(SH-4A
SH4AL-DSP)7 stage
SH-DSP
SH-X28 stage, separate
I-D mem.
SH-X3SMP/AMP
now
SH3-DSP
![Page 6: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/6.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.6
Major targets of SH
Controller SH-2A
Automotive, Industry,Consumer
Processor SH-4A (*1)
Mobile, CIS(navigation), Network consumer
*1) including SH4AL-DSP
![Page 7: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/7.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.7
Latest processor type: SH-X3
8-stage, dual-issue, in-orderUp to four CPULocal memory (ILRAM, DLRAM, URAM) for AMP,
Data cache snoop for SMPLow power state support ( sleep / light sleep / r-
standby / standby ) Shared L2 cache SH-X3 CPU block 3
CPU block 2CPU block 1
Sn
oo
p
Co
ntr
oll
er
CPU block 0
CPU FPU/DSP
I$ D$
URAM
ILRAM
MMU
DLRAMDTU
SuperHyway (main on-chip interconnect)
Inte
rrup
t Con
trol
ler
Deb
ugC
loc
k C
on
tro
lle
r
L2
ca
ch
e
![Page 8: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/8.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.8
L2 of SH-X3
Shared for CPUsWrite through
SH-X3 CPU block 3CPU block 2
CPU block 1
Sn
oo
p
Co
ntr
oll
er
CPU block 0
CPU FPU/DSP
I$ D$
URAM
ILRAM
MMU
DLRAMDTU
SuperHyway (main on-chip interconnect)
Inte
rrup
t Con
trol
ler
Deb
ugC
loc
k C
on
tro
lle
r
L2
ca
ch
e
![Page 9: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/9.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.9
SH-X3 Prototype (RP1)
Core #0
SNC
GCPG
Peripherals
DAA
Core #1
Core #2
Core #3
• 90nm-CMOS• triple Vth• 8-metal layers• chip size: 9.88mm x 9.88mm• power 3W (typ., all core 600 MHz)
Ref. Y. Yoshida, ISSCC 2007
![Page 10: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/10.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.10
SH-X3 Prototype (RP2)
Core#2 Core#3
Core#1
Core#4 Core#5
Core#6 Core#7
SNC0
SNC1
DBSC
DDRPAD
GCPG
CSM
LBSC
SHWY
URAMDLRAM
Core#0ILRAM
D$
I$ Process Technology
90nm, 8-layer, triple-Vth, CMOS
Chip Size 104.8mm2 (10.61mm x 9.88mm)
CPU Core Size
6.6mm2 (3.36mm x 1.96mm)
Supply Voltage
1.0V–1.4V (internal), 1.8/3.3V (I/O)
Power Domains
17 (8 CPUs, 8 URAMs, common)
VSWC
Ref. M. Ito, ISSCC2008
![Page 11: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/11.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.11
cover
Low power
![Page 12: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/12.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.12
Motivation for low-power
Low power for …500mW, … , 50uW --- for various mode and
battery lifetime (ex. mobile)50kW/m2 --- room thermal design cost (ex. data
center)
What power is important to reduce?
---- All
Mobile Chip
Server Chip
Dynamic
#1
#3
Static
#2
#4
![Page 13: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/13.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.13
Power density
Embedded core has overcome the power density problem with 25 times frequency.
1993SH-1
0.8um20MHz
2008SH-Mobile
65nm500MHz
Chip size (xy)
CPU- size(xy), power
density(z)
( Future )
![Page 14: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/14.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.14
Low-power evolutions
1993 1994 1995 1997 1999 2002 2004 2005
6.04.5
1.1
0.72
0.1
0.3
0.040.03
0.8um 0.5um
0.25um
0.2um
0.18um
0.13um90nm
SH-1 SH-2 SH-3SH-4
SH4-VLSH-Mobile
SH-Mobile3SH-MobileG1
0.01
0.1
1
10
GIPS/W
Clock StopBack BiasClock StopBack Bias
Power downU-standby
Power downU-standby
Power down during operationHierarchical power supply
Power down during operationHierarchical power supply
R-standby(Power down w/keeping data)R-standby(Power down w/keeping data)
![Page 15: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/15.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.15
Dynamic power
Clock gating for gereral caseClock frequency reduction for multicore caseMulticore specific low-power design for multicore
case
![Page 16: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/16.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.16
Tutorial: why clock gating is popular?
Abs(Int(256*sin(π/40x)))
D[7:0]
D[6]
D[0]
clock
8
clock
![Page 17: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/17.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.17
Clock gating in a CPU
ph1 edge trigger FF
ph2 transparent latch
PLL
A-drv
B-drv C-drvHardwaredynamic Hardware
dynamicSoftwarestatic
Clockcontrol
registers
128-256 FFCPUFPU
Cache Ctrl.
D-drv GCK cell FF
GCK cell: Gated Clock cellPLL: Phase Locked Loop
B-drv is used to gate clock for a whole module.C / D-drv is controlled dynamically (cycle by cycle).
Ref. T. Yamada, ICCD2005
![Page 18: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/18.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.18
Clock gating design issues
Gating is more effective, ifsearching a proper signal with high “cut(0)”
probability,gating in an earlier driver.
Gating signals are manually extracted in SH-team to utilize “don’t-care data” knowledge.
Design loop (RTL - gate - power_calc - analysis - ) helps to find larger improvement part.
![Page 19: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/19.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.19
Gated design example
Pointer pipeline is one (difficult) example of gated clock design.
E1 E2 E3 WBRegister
File
E2 outE2 in
E3 outE3 in
WB out
Data D
Data B Data A
Data E
Data C
Data B
Data C
Data C
Data D
cycl
e
E1 in
updated
E1 E2 E3 WBRegister
File
E2 outE2 in
E3 outE3 in
WB out
Data D
Data B Data A
Data E
Data C
Data B
Data C
Data C
Data D
cycl
e
E1 in
updated
E1 inE2 inE3 in
Pointer (2-bit)
E2 out
E3 out
WB out(to Reg.File)
B1
B2
B3
Dec.INC
clk
E1 inE2 inE3 in
Pointer (2-bit)
E2 out
E3 out
WB out(to Reg.File)
B1
B2
B3
Dec.INC
clkRef. K. Kamei, ISSCC2004
![Page 20: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/20.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.20
Clock power reduction for RP1
Individual core clock distribution scheme.
Ref. Y. Yoshida, ISSCC 2007
CPU#0
IC DC
CPU#1
IC DC
CPU#2
IC DC
CPU#3
IC DC
On Chip Interconnect (SuperHywy)
SnoopCntl
1/11/21/41/6
DIV#0 DIV#1 DIV#2 DIV#3
1/2
600MHz clock
![Page 21: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/21.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.21
Clock power reduction for RP1 (cont.)
In different frequency SMP case, snoop has extra cycle, (mainly for data-share case).
600MHz 600MHz 600MHz 600MHz
600MHz 600MHz
600MHz 600MHz
300MHz 300MHz
300MHz 150MHz
300MHz
Data Cache DAA
300MHz
300MHzDAA (duplicated address array) with fixed freq., and single-depth background
invalidation hides the frequency sub-effect in many cases.
![Page 22: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/22.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.22
Multicore specific power design (SH-X3)
Snoop controller design
serial (non-speculative) operation (L1$ miss, then DAA access),
when L1$ hit, many FF-clock are gated-offThe frequency (half of CPU clock) makes cell size
small.
Comparing with L1-cache control logic, 20% power per (FF x f).
![Page 23: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/23.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.23
Static power
CPU, SRAM(URAM) multi power domain design Motivation
(i) For lower frequency operation in dynamic freq., the weight of static leakage current is getting large
(ii) In a leaky device, the static current is not negligible.
![Page 24: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/24.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.24
RP2 multiple (17) power domains
Core #3
I$16K
D$16K
CPU FPU
User RAM 64K
Local memoryI:8K, D:32K
Core #2
I$16K
D$16K
CPU FPU
User RAM 64K
Local memoryI:8K, D:32K
Core #1
I$16K
D$16K
CPU FPU
User RAM 64K
Local memoryI:8K, D:32K
Core #0
I$16K
D$16K
CPU FPU
URAM 64K
Local memoryI:8K, D:32K
CCNBAR
LCPG0
On-chip system bus (SuperHyway)
DDR2LCPG: Local clock pulse generator PCR: Power Control RegisterCCN/BAR:Cache controller/Barrier RegisterURAM: User RAM
Snoo
p co
ntro
ller 1
Snoo
p co
ntro
ller 0
Cluster #0 Cluster #1
PCR3PCR2PCR1PCR0
LCPG1PCR7PCR6PCR5PCR4
controlSRAM
controlDMA
control
Core #7
I$16K
D$16K
CPUFPU
User RAM 64KI:8K, D:32K
Core #6
I$16K
D$16K
CPUFPU
User RAM 64KI:8K, D:32K
Core #5
I$16K
D$16K
CPUFPU
User RAM 64KI:8K, D:32K
Core #4
I$16K
D$16K
CPUFPU
URAM 64K
Local memoryI:8K, D:32K
CCNBAR
![Page 25: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/25.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.25
clock off power off
Power control: five power modes
2 additional power modes for leakage power saving RP1 supports only clock control Resume power-off : URAM kept powered for fast
restart Full power-off: Complete leakage power saving
8 CPUs independently select appropriate power mode
Power modesCPU
Cache
URAM
Normal LightSleep Sleep Resume
Power-offFull
Power-off
clock off poweron
activeclock offpower on
![Page 26: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/26.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.26
Power domain implementation
Power Control Register in LCPGVSWC: Power Switch ControllerLCPG: Local Clock Pulse Generator for each Core
1964µm
:Power Switch for Core
:Power Switch for URAM
3363µm
VSWCfor Core
C0U0
C2U2
C6U6
C4U4
C1U1
C3U3
C7 U7
C5 U5
URAM
70µm
50µm
120µm
Core
120µm50µm
C0-C7 :CoreU0-U7:URAM
:VSWC x 8
VSWCfor URAM
VSSM(virtual ground)
VSS
![Page 27: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/27.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.27
Power domain isolation
PSW : Power SwitchPSWC: PSW controller
Area#2
Logic
μI/O
VDD
Area#1
Logic
PSW1PSW2
backuplatchregister
μI/O
op
en
PSCPSWC2PSWC1
Micro I/O cell is inserted:
to prevent unknown value transmission to prevent DC current on first stage gate
![Page 28: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/28.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.28
Power Mode Transition
・・・ LSLEEP POFF RESUME RESET
Power Control Register(PCR)in LCPG for each CPU core
Light Sleep Normal
FullPower-off
ResumePower-off
Sleep
Sleep instruction with LSLEEP=0
Sleep instruction with LSLEEP=1
Interrupt Interrupt
RESUME=1 POFF=1
RESET=1 RESET=1
Transition time between power modes 5us for power-off and 30us for recovery Immediate transition for Sleep/Light-sleep
![Page 29: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/29.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.29
Power consumption for each power modePo
wer
Con
sum
ptio
n (m
W)
Power consumption for 8 CPU cores All data are measured at room temp. at 1.1V by silicon Dynamic power for “Normal” is measured by IDLE-loop
• 304mW is still consumed even when all CPUs are in “Sleep” and leakage power accounts for 70%• 35mW by URAM leakage for “Resume power-off” saves 88% power compared with “Sleep”
Normal
1214
Lightsleep
216239
Sleep
88
Resumepower-off
35Fullpower-off
0
1430
455304
216
Leakage power
Dynamic power
216
88% reduction
RP1 power RP2 new mode
![Page 30: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/30.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.30
Power gating design tool
Design environment (EDAs) to enable power gating design.
Power off gate-level simulation (set “unknown” = value of x, to all FFs in power-off domain)
Transistor-level leakage path checker (check leakage path through well connect etc.)
mIO (isolation cell) insertion tool Inter-power-domain isolation checker
![Page 31: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/31.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.31
cover
Debug Function
![Page 32: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/32.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.32
On-chip debugger
Debug control signal makes an user system board “debug-able”.
Trace signal dumps internal state, a typical usage example is branch trace.
user system
emulator
PC
debug control signaltrace signal (optional)
![Page 33: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/33.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.33
Single-core debug function (startpoint)
Debug mode (of CPU)BreakpointDebug handler (enter / exit, state save / restore)Trace {to trace port / to memory}
Policies on trace(i) Full trace: add processor stall / wait to keep all trace information(ii) real-time trace: let the processor proceed as its own cycle, trace information
beyond limitation is lost.
![Page 34: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/34.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.34
600MHz AUD (trace) port
Trace port bandwidth is critical when tracing high-traffic event (example: branch trace, variable trace),
600MHz AUD port (SSTL18, 300MHz clock x dual edge) enables high bandwidth trace function
E200F on-chip debugger (w/ 600MHz trace)
38-pin connecor
![Page 35: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/35.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.35
Multi core debug function
Simultaneous breakSimultaneous continueMonitor / Counter for snoop state signal and
event
![Page 36: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/36.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.36
CPU
Trace Output Pin
Trace Output Pin
Arbitration
CPU#0
CPU#1
Motivation : Simple extension for multi-core causes higher LOST ratio on real-time trace mode
Trace efficiency for multi-core
multiple cores share common trace pin, average availability for each core decreases.
Simple n-times extension of trace-buffer causes higher rate of “LOST” (in real-time mode).
Minimum branch cycle is one on core and Trace packet (on pins) take multiple core cycle, trace buffer relaxes this peak speed gap.
Trace BufferDBG
DBG: Debug Controller
Core
Simple extension
DBG#0
DBG#1
8 (ex.)
8 (ex.)
![Page 37: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/37.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.37
Solution : Flexible allocation as needed for amount of trace information Trace Buffer is shared for each CPU Output information is sequential from each CPU
Flexible allocation of trace buffer
CPU#0
CPU#1
Trace Buffer
CPU#2
CPU#3
Example of Trace Buffer occupancyCPU #0 : Over 25%CPU #1 : CPU #2 : CPU #3 : Over 25%
Trace Buffer
End of Valid Data
Trace OutputFrom
each CPU
Ref. Y. Yoshida, Cool Chips 2008.
![Page 38: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/38.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.38
CPU #0
CPU #1
CPU #2
CPU #3
DBG0Cluster #0
CPU #4
CPU #5
CPU #6
CPU #7
DBG1Cluster #1
GDBG
Implementation of trace output (RP2)
Hierarchically structured for concurrent trace up to 8 CPU Timing path closure inside the cluster (600MHz)
8bit, 300MHz, Dual-edge = 4.8Gbps
600MHz
32bit , 150MHz
600MHz
Long wire can not meet the timing constraint (600MHz) between clusters and GDBG
![Page 39: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/39.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.39
Evaluation program consists of JPEG compress/decompress and MPEG2dec. in SMP (4 CPU) on 8 CPU concurrently
Trace information consists of source/destination address of branch instruction
Measured every 500 cycles of a particular 5000 cycles by instruction simulator with queuing model of Trace Buffer
Evaluation results of trace
Ave
. LO
ST R
ate
Without Flexible
Allocation
This work(With Flexible
Allocation)
0
10
20[%]
19.2%
5.4% CPU #0-#3 : MPEG2dec in SMPCPU #4,#5 : JPEG compressCPU #6,#7 : JPEG decompress
13.8% Reduction
![Page 40: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/40.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.40
cover
Multi-core Demo
![Page 41: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/41.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.41
Demo (RP2)
AMP MPEG4 demo (μITRON x 8)
All CPUs running
Shutdown partial CPUs and the power is reduced
RP2
memory
uITRON
CPU0
User Appl
Switch(board)
MPG4
uITRON
uITRON
uITRON
uITRON
uITRON
uITRON
uITRON
CPU3 CPU4 CPU5 CPU6 CPU7CPU2CPU1
MPG4 MPG4 MPG4 MPG4 MPG4 MPG4 MPG4
har
dw
are
soft
war
e
PCR0 PCR1 PCR2 PCR3 PCR4 PCR5 PCR6 PCR7Power control register
![Page 42: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/42.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.42
Demo: explanation of execution flow
Shutdown flowWake-up flow
■ Power control main routine is under control of CPU #0■ Each CPU execute its shutdown routine The below is example to shutdown / wake-up CPU #5
RP2
memory
uITRON
CPU3CPU0
User Appl
Switch(board)
CPU4 CPU5
MPG4
uITRON
CPU6
uITRON
uITRON
uITRON
uITRON
uITRON
uITRON
CPU7CPU2CPU1
MPG4 MPG4 MPG4 MPG4 MPG4 MPG4 MPG4
har
dw
are
soft
war
e
PCR0 PCR1 PCR2 PCR3 PCR4 PCR5 PCR6 PCR7
1
2
5 a3b
4a Power-on -> Run4bCPU-CPU interrupt3 a
Power control register
![Page 43: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/43.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.43
Demo
![Page 44: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/44.jpg)
MPSoC ’09 Rev. 1.00 00000-A
©2009. Renesas Technology Corp., All rights reserved.44
Summary
On realizine multi-core LSI, I have described three technical points: Low dynamic power design, Low static power design (especially core-
wise power control), Multi-core debug function.
![Page 45: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,](https://reader035.fdocument.pub/reader035/viewer/2022062518/5697bf811a28abf838c85844/html5/thumbnails/45.jpg)
©2009. Renesas Technology Corp., All rights reserved.