Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU...

45
Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp., All rights reserved.

Transcript of Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU...

Page 1: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

Low power design and debug support of embedded multicore

MPSoC ’09

Rev. 1.00 00000-A

CPU Development Dept 1

Osamu Nishii

©2009. Renesas Technology Corp., All rights reserved.

Page 2: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.2

Outline of talk

Embedded processor SHLow power designDebug functionMulti-core demoSummary

Page 3: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.3

cover

Embedded Processor SH

Page 4: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.4

When SH starts …

In 1991,“Micro-controller” area was stable, its processor

was small CISC CPU, and one instruction took multiple cycles,

“RISC” took its position in EWS/server, single cycle execution and initial superscalar architecture were known. Mainly, because of RISC’s off-chip fast RAM memory system, RISC system cost was expensive.

SH started as small RISC, featuring reduced instruction length 16-bit.

Page 5: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.5

SH short history

1990 2000 2010

SH-1 SH-2

(Cache) SH-3 (MMU)

SH-4 (FPU)

SH-X(SH-4A

SH4AL-DSP)7 stage

SH-DSP

SH-X28 stage, separate

I-D mem.

SH-X3SMP/AMP

now

SH3-DSP

Page 6: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.6

Major targets of SH

Controller SH-2A

Automotive, Industry,Consumer

Processor SH-4A (*1)

Mobile, CIS(navigation), Network consumer

*1) including SH4AL-DSP

Page 7: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.7

Latest processor type: SH-X3

8-stage, dual-issue, in-orderUp to four CPULocal memory (ILRAM, DLRAM, URAM) for AMP,

Data cache snoop for SMPLow power state support ( sleep / light sleep / r-

standby / standby ) Shared L2 cache SH-X3 CPU block 3

CPU block 2CPU block 1

Sn

oo

p

Co

ntr

oll

er

CPU block 0

CPU FPU/DSP

I$ D$

URAM

ILRAM

MMU

DLRAMDTU

SuperHyway (main on-chip interconnect)

Inte

rrup

t Con

trol

ler

Deb

ugC

loc

k C

on

tro

lle

r

L2

ca

ch

e

Page 8: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.8

L2 of SH-X3

Shared for CPUsWrite through

SH-X3 CPU block 3CPU block 2

CPU block 1

Sn

oo

p

Co

ntr

oll

er

CPU block 0

CPU FPU/DSP

I$ D$

URAM

ILRAM

MMU

DLRAMDTU

SuperHyway (main on-chip interconnect)

Inte

rrup

t Con

trol

ler

Deb

ugC

loc

k C

on

tro

lle

r

L2

ca

ch

e

Page 9: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.9

SH-X3 Prototype (RP1)

Core #0

SNC

GCPG

Peripherals

DAA

Core #1

Core #2

Core #3

• 90nm-CMOS• triple Vth• 8-metal layers• chip size: 9.88mm x 9.88mm• power 3W (typ., all core 600 MHz)

Ref. Y. Yoshida, ISSCC 2007

Page 10: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.10

SH-X3 Prototype (RP2)

Core#2 Core#3

Core#1

Core#4 Core#5

Core#6 Core#7

SNC0

SNC1

DBSC

DDRPAD

GCPG

CSM

LBSC

SHWY

URAMDLRAM

Core#0ILRAM

D$

I$ Process Technology

90nm, 8-layer, triple-Vth, CMOS

Chip Size 104.8mm2 (10.61mm x 9.88mm)

CPU Core Size

6.6mm2 (3.36mm x 1.96mm)

Supply Voltage

1.0V–1.4V (internal), 1.8/3.3V (I/O)

Power Domains

17 (8 CPUs, 8 URAMs, common)

VSWC

Ref. M. Ito, ISSCC2008

Page 11: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.11

cover

Low power

Page 12: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.12

Motivation for low-power

Low power for …500mW, … , 50uW --- for various mode and

battery lifetime (ex. mobile)50kW/m2 --- room thermal design cost (ex. data

center)

What power is important to reduce?

---- All

Mobile Chip

Server Chip

Dynamic

#1

#3

Static

#2

#4

Page 13: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.13

Power density

Embedded core has overcome the power density problem with 25 times frequency.

1993SH-1

0.8um20MHz

2008SH-Mobile

65nm500MHz

Chip size (xy)

CPU- size(xy), power

density(z)

( Future )

Page 14: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.14

Low-power evolutions

1993 1994 1995 1997 1999 2002 2004 2005

6.04.5

1.1

0.72

0.1

0.3

0.040.03

0.8um 0.5um

0.25um

0.2um

0.18um

0.13um90nm

SH-1 SH-2 SH-3SH-4

SH4-VLSH-Mobile

SH-Mobile3SH-MobileG1

0.01

0.1

1

10

GIPS/W

Clock StopBack BiasClock StopBack Bias

Power downU-standby

Power downU-standby

Power down during operationHierarchical power supply

Power down during operationHierarchical power supply

R-standby(Power down w/keeping data)R-standby(Power down w/keeping data)

Page 15: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.15

Dynamic power

Clock gating for gereral caseClock frequency reduction for multicore caseMulticore specific low-power design for multicore

case

Page 16: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.16

Tutorial: why clock gating is popular?

Abs(Int(256*sin(π/40x)))

D[7:0]

D[6]

D[0]

clock

8

clock

Page 17: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.17

Clock gating in a CPU

ph1 edge trigger FF

ph2 transparent latch

PLL

A-drv

B-drv C-drvHardwaredynamic Hardware

dynamicSoftwarestatic

Clockcontrol

registers

128-256 FFCPUFPU

Cache Ctrl.

D-drv GCK cell FF

GCK cell: Gated Clock cellPLL: Phase Locked Loop

B-drv is used to gate clock for a whole module.C / D-drv is controlled dynamically (cycle by cycle).

Ref. T. Yamada, ICCD2005

Page 18: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.18

Clock gating design issues

Gating is more effective, ifsearching a proper signal with high “cut(0)”

probability,gating in an earlier driver.

Gating signals are manually extracted in SH-team to utilize “don’t-care data” knowledge.

Design loop (RTL - gate - power_calc - analysis - ) helps to find larger improvement part.

Page 19: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.19

Gated design example

Pointer pipeline is one (difficult) example of gated clock design.

E1 E2 E3 WBRegister

File

E2 outE2 in

E3 outE3 in

WB out

Data D

Data B Data A

Data E

Data C

Data B

Data C

Data C

Data D

cycl

e

E1 in

updated

E1 E2 E3 WBRegister

File

E2 outE2 in

E3 outE3 in

WB out

Data D

Data B Data A

Data E

Data C

Data B

Data C

Data C

Data D

cycl

e

E1 in

updated

E1 inE2 inE3 in

Pointer (2-bit)

E2 out

E3 out

WB out(to Reg.File)

B1

B2

B3

Dec.INC

clk

E1 inE2 inE3 in

Pointer (2-bit)

E2 out

E3 out

WB out(to Reg.File)

B1

B2

B3

Dec.INC

clkRef. K. Kamei, ISSCC2004

Page 20: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.20

Clock power reduction for RP1

Individual core clock distribution scheme.

Ref. Y. Yoshida, ISSCC 2007

CPU#0

IC DC

CPU#1

IC DC

CPU#2

IC DC

CPU#3

IC DC

On Chip Interconnect (SuperHywy)

SnoopCntl

1/11/21/41/6

DIV#0 DIV#1 DIV#2 DIV#3

1/2

600MHz clock

Page 21: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.21

Clock power reduction for RP1 (cont.)

In different frequency SMP case, snoop has extra cycle, (mainly for data-share case).

600MHz 600MHz 600MHz 600MHz

600MHz 600MHz

600MHz 600MHz

300MHz 300MHz

300MHz 150MHz

300MHz

Data Cache DAA

300MHz

300MHzDAA (duplicated address array) with fixed freq., and single-depth background

invalidation hides the frequency sub-effect in many cases.

Page 22: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.22

Multicore specific power design (SH-X3)

Snoop controller design

serial (non-speculative) operation (L1$ miss, then DAA access),

when L1$ hit, many FF-clock are gated-offThe frequency (half of CPU clock) makes cell size

small.

Comparing with L1-cache control logic, 20% power per (FF x f).

Page 23: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.23

Static power

CPU, SRAM(URAM) multi power domain design Motivation

(i) For lower frequency operation in dynamic freq., the weight of static leakage current is getting large

(ii) In a leaky device, the static current is not negligible.

Page 24: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.24

RP2 multiple (17) power domains

Core #3

I$16K

D$16K

CPU FPU

User RAM 64K

Local memoryI:8K, D:32K

Core #2

I$16K

D$16K

CPU FPU

User RAM 64K

Local memoryI:8K, D:32K

Core #1

I$16K

D$16K

CPU FPU

User RAM 64K

Local memoryI:8K, D:32K

Core #0

I$16K

D$16K

CPU FPU

URAM 64K

Local memoryI:8K, D:32K

CCNBAR

LCPG0

On-chip system bus (SuperHyway)

DDR2LCPG: Local clock pulse generator PCR: Power Control RegisterCCN/BAR:Cache controller/Barrier RegisterURAM: User RAM

Snoo

p co

ntro

ller 1

Snoo

p co

ntro

ller 0

Cluster #0 Cluster #1

PCR3PCR2PCR1PCR0

LCPG1PCR7PCR6PCR5PCR4

controlSRAM

controlDMA

control

Core #7

I$16K

D$16K

CPUFPU

User RAM 64KI:8K, D:32K

Core #6

I$16K

D$16K

CPUFPU

User RAM 64KI:8K, D:32K

Core #5

I$16K

D$16K

CPUFPU

User RAM 64KI:8K, D:32K

Core #4

I$16K

D$16K

CPUFPU

URAM 64K

Local memoryI:8K, D:32K

CCNBAR

Page 25: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.25

clock off power off

Power control: five power modes

2 additional power modes for leakage power saving RP1 supports only clock control Resume power-off : URAM kept powered for fast

restart Full power-off: Complete leakage power saving

8 CPUs independently select appropriate power mode

Power modesCPU

Cache

URAM

Normal LightSleep Sleep Resume

Power-offFull

Power-off

clock off poweron

activeclock offpower on

Page 26: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.26

Power domain implementation

Power Control Register in LCPGVSWC: Power Switch ControllerLCPG: Local Clock Pulse Generator for each Core

1964µm

:Power Switch for Core

:Power Switch for URAM

3363µm

VSWCfor Core

C0U0

C2U2

C6U6

C4U4

C1U1

C3U3

C7 U7

C5 U5

URAM

70µm

50µm

120µm

Core

120µm50µm

C0-C7 :CoreU0-U7:URAM

:VSWC x 8

VSWCfor URAM

VSSM(virtual ground)

VSS

Page 27: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.27

Power domain isolation

PSW : Power SwitchPSWC: PSW controller

Area#2

Logic

μI/O

VDD

Area#1

Logic

PSW1PSW2

backuplatchregister

μI/O

op

en

PSCPSWC2PSWC1

Micro I/O cell is inserted:

to prevent unknown value transmission to prevent DC current on first stage gate

Page 28: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.28

Power Mode Transition

・・・ LSLEEP POFF RESUME RESET

Power Control Register(PCR)in LCPG for each CPU core

Light Sleep Normal

FullPower-off

ResumePower-off

Sleep

Sleep instruction with LSLEEP=0

Sleep instruction with LSLEEP=1

Interrupt Interrupt

RESUME=1 POFF=1

RESET=1 RESET=1

Transition time between power modes 5us for power-off and 30us for recovery Immediate transition for Sleep/Light-sleep

Page 29: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.29

Power consumption for each power modePo

wer

Con

sum

ptio

n (m

W)

Power consumption for 8 CPU cores All data are measured at room temp. at 1.1V by silicon Dynamic power for “Normal” is measured by IDLE-loop

• 304mW is still consumed even when all CPUs are in “Sleep” and leakage power accounts for 70%• 35mW by URAM leakage for “Resume power-off” saves 88% power compared with “Sleep”

Normal

1214

Lightsleep

216239

Sleep

88

Resumepower-off

35Fullpower-off

0

1430

455304

216

Leakage power

Dynamic power

216

88% reduction

RP1 power RP2 new mode

Page 30: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.30

Power gating design tool

Design environment (EDAs) to enable power gating design.

Power off gate-level simulation (set “unknown” = value of x, to all FFs in power-off domain)

Transistor-level leakage path checker (check leakage path through well connect etc.)

mIO (isolation cell) insertion tool Inter-power-domain isolation checker

Page 31: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.31

cover

Debug Function

Page 32: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.32

On-chip debugger

Debug control signal makes an user system board “debug-able”.

Trace signal dumps internal state, a typical usage example is branch trace.

user system

emulator

PC

debug control signaltrace signal (optional)

Page 33: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.33

Single-core debug function (startpoint)

Debug mode (of CPU)BreakpointDebug handler (enter / exit, state save / restore)Trace {to trace port / to memory}

Policies on trace(i) Full trace: add processor stall / wait to keep all trace information(ii) real-time trace: let the processor proceed as its own cycle, trace information

beyond limitation is lost.

Page 34: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.34

600MHz AUD (trace) port

Trace port bandwidth is critical when tracing high-traffic event (example: branch trace, variable trace),

600MHz AUD port (SSTL18, 300MHz clock x dual edge) enables high bandwidth trace function

E200F on-chip debugger (w/ 600MHz trace)

38-pin connecor

Page 35: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.35

Multi core debug function

Simultaneous breakSimultaneous continueMonitor / Counter for snoop state signal and

event

Page 36: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.36

CPU

Trace Output Pin

Trace Output Pin

Arbitration

CPU#0

CPU#1

Motivation : Simple extension for multi-core causes higher LOST ratio on real-time trace mode

Trace efficiency for multi-core

multiple cores share common trace pin, average availability for each core decreases.

Simple n-times extension of trace-buffer causes higher rate of “LOST” (in real-time mode).

Minimum branch cycle is one on core and Trace packet (on pins) take multiple core cycle, trace buffer relaxes this peak speed gap.

Trace BufferDBG

DBG: Debug Controller

Core

Simple extension

DBG#0

DBG#1

8 (ex.)

8 (ex.)

Page 37: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.37

Solution : Flexible allocation as needed for amount of trace information Trace Buffer is shared for each CPU Output information is sequential from each CPU

Flexible allocation of trace buffer

CPU#0

CPU#1

Trace Buffer

CPU#2

CPU#3

Example of Trace Buffer occupancyCPU #0 : Over 25%CPU #1 : CPU #2 : CPU #3 : Over 25%

Trace Buffer

End of Valid Data

Trace OutputFrom

each CPU

Ref. Y. Yoshida, Cool Chips 2008.

Page 38: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.38

CPU #0

CPU #1

CPU #2

CPU #3

DBG0Cluster #0

CPU #4

CPU #5

CPU #6

CPU #7

DBG1Cluster #1

GDBG

Implementation of trace output (RP2)

Hierarchically structured for concurrent trace up to 8 CPU Timing path closure inside the cluster (600MHz)

8bit, 300MHz, Dual-edge = 4.8Gbps

600MHz

32bit , 150MHz

600MHz

Long wire can not meet the timing constraint (600MHz) between clusters and GDBG

Page 39: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.39

Evaluation program consists of JPEG compress/decompress and MPEG2dec. in SMP (4 CPU) on 8 CPU concurrently

Trace information consists of source/destination address of branch instruction

Measured every 500 cycles of a particular 5000 cycles by instruction simulator with queuing model of Trace Buffer

Evaluation results of trace

Ave

. LO

ST R

ate

Without Flexible

Allocation

This work(With Flexible

Allocation)

0

10

20[%]

19.2%

5.4% CPU #0-#3 : MPEG2dec in SMPCPU #4,#5 : JPEG compressCPU #6,#7 : JPEG decompress

13.8% Reduction

Page 40: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.40

cover

Multi-core Demo

Page 41: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.41

Demo (RP2)

AMP MPEG4 demo (μITRON x 8)

All CPUs running

Shutdown partial CPUs and the power is reduced

RP2

memory

uITRON

CPU0

User Appl

Switch(board)

MPG4

uITRON

uITRON

uITRON

uITRON

uITRON

uITRON

uITRON

CPU3 CPU4 CPU5 CPU6 CPU7CPU2CPU1

MPG4 MPG4 MPG4 MPG4 MPG4 MPG4 MPG4

har

dw

are

soft

war

e

PCR0 PCR1 PCR2 PCR3 PCR4 PCR5 PCR6 PCR7Power control register

Page 42: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.42

Demo: explanation of execution flow

Shutdown flowWake-up flow

■ Power control main routine is under control of CPU #0■ Each CPU execute its shutdown routine The below is example to shutdown / wake-up CPU #5

RP2

memory

uITRON

CPU3CPU0

User Appl

Switch(board)

CPU4 CPU5

MPG4

uITRON

CPU6

uITRON

uITRON

uITRON

uITRON

uITRON

uITRON

CPU7CPU2CPU1

MPG4 MPG4 MPG4 MPG4 MPG4 MPG4 MPG4

har

dw

are

soft

war

e

PCR0 PCR1 PCR2 PCR3 PCR4 PCR5 PCR6 PCR7

5 a3b

4a Power-on -> Run4bCPU-CPU interrupt3 a

Power control register

Page 43: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.43

Demo

Page 44: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

MPSoC ’09 Rev. 1.00 00000-A

©2009. Renesas Technology Corp., All rights reserved.44

Summary

On realizine multi-core LSI, I have described three technical points: Low dynamic power design, Low static power design (especially core-

wise power control), Multi-core debug function.

Page 45: Low power design and debug support of embedded multicore MPSoC ’09 Rev. 1.00 00000-A CPU Development Dept 1 Osamu Nishii ©2009. Renesas Technology Corp.,

©2009. Renesas Technology Corp., All rights reserved.