경종민 [email protected] 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’...

36
1 경경경 [email protected] Multiple-FPGA System; SoC Verification using an array of FPGA’s

Transcript of 경종민 [email protected] 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’...

Page 1: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

1

경종민 [email protected]

Multiple-FPGA System;SoC Verification using

an array of FPGA’s

Page 2: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

2

Introduction• Hardware Emulation System

– Device for verifying digital circuit design along with its target system prior to fabrication of chips

– Merits• Fast verification of the logic design compared to software

simulation• Real (physical) signal application/monitoring is possible.

Emulator

In-circuit Logic Emulator

C testbench

HDL testbench

Emulator

Simulation Accelerator

Page 3: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

3

Introduction• Why Emulation?

– FPGA prototype allows design verification with real signals to and from the target environment, at a speed lot faster than software simulation.

Year Month Week Day Hour Minute

Month

Week

Day

Hour

Minute Logic Simulati

on

Final Silicon

Accelerated

Simulation Logic

Emulation

Setup time

Executiontime

Page 4: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

4

FPGA vs. Emulation System

• FPGA– Developed in1980’s– Reconfigurable logic and routing architecture– The gate capacity of FPGA is smaller than that of the state-of

-the–art ASIC design• Currently, gate count of FPGA with maximum gate capacity is ab

out 8M gates (About 1.6M gates are for logic gates while the rest is memory.)

• Emulation Systems– An array of FPGA’s or special processors are interconnected v

ia. interconnection networks.– The whole target design must be partitioned into a set of sub

circuits, such that each can be contained in an FPGA.

Page 5: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

5

Requirements of Hardware Emulator• Five Requirements

– 1.Gate capacity• With the advent of SOC era, about 50-100M (Intel’s P4 is

about 60M gates including cache memories)

– 2.Speed• Emulation system should be faster than other verification

environmentsCycle/sec (Hz)

Software Simulation

Coverfication w/ Emulator

100Hz

10KHz

1MHz

Hardware Emulator

Page 6: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

6

Requirements of Hardware Emulator

– 3.Debuggability• Today’s emulators provide 100% debugging capability.

– 4.Expandible architecture• The architecture of emulator should be expandible to include mo

re logic gates.– 5.Low Cost

• Cost of current emulators is still very high. • Cost of Mercury from Quickturn $4M ~ $5M, with each FPGA

board addition $0.2M

Page 7: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

7

Basic architecture of Emulators

• Multiple FPGA’s– Multiple FPGA’s can be connected to increase the gate capac

ity of emulation system.– Several interconnection architecture

• Full Crossbar network, folded-Clos network• Time-multiplexed interconnect• Virtual wire

• Embedded logic analyzer– Customized FPGA to extract internal values.– Local memory to save extracted values.

Page 8: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

8

Pin limitation

Xilinx, Altera do not have many pins for partitioned circuits

Partitioned circuits by commercial partitioning tools for several designs.

• Pin count vs. Gate count of partition– Many large designs yield far more pins than available from FPGA’s

• Sparcle processor : Processor designed by MIT, LSI Logic and Sun for multiprocessor system at 1994.

• Alewife CC : Cache controller designed at MIT

Page 9: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

9

Interconnection Architectures

• Mesh type interconnection– 2D mesh for FPGA interconnection

• Crossbar network (Separated Interconnection)– One Full crossbar network– Partial crossbar network (folded-Clos)

• Time-multiplexed– Dynamic FPID interconnect architecture– Time-multiplexed interconnect from Quickturn– Virtual Wire

Page 10: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

10

Mesh Interconnection• Another 2-D mesh

– (US patent 6389379, Axis, 2002)– FPGA’s on the same row/column are connected.– Only two “Hops” and “Jumps” are sufficient for any type of net.– Each FPGA resource is used for routing as well as logic mapping, which agg

ravates pin limitation problem.

FPGA 11

FPGA 21

FPGA 31

FPGA 41

FPGA 12

FPGA 22

FPGA 32

FPGA 42

FPGA 13

FPGA 23

FPGA 33

FPGA 43

FPGA 14

FPGA 24

FPGA 34

FPGA 44

Page 11: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

11

Crossbar Network• Full crossbar

– Separates logic FPGA from interconnection device.– One full crossbar connects any net of any FPGA to any net of any FP

GA after programming.– The size of full-crossbar grows exponentially as the number of FPGA’

s increases

FPGA 0 FPGA 1 FPGA 2 FPGA 3A B C D A B C D A B C D A B C D

Full Crossbar

Page 12: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

12

Crossbar Network• Partial crossbar

– (“An Efficient Logic Emulation System,” TVLSI, 1993)– I/O of each FPGA is divided into subsets. The pins of each

crossbar chip are connected to the same subset of pins from each FPGA.

– Still requires a large number of crossbars called FPID (Field Programmable Interconnection Device).

FPGA 0 FPGA 1 FPGA 2 FPGA 3A B C D A B C D A B C D A B C D

C0 C1 C2 C3

Page 13: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

13

Crossbar Network• What is FPIC?

– “Field-Programmable Interconnect Component”– Reconfigurable interconnection chip.– Aptix.Inc incorporates FPIC for the interconnection.

FPIC

Page 14: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

14

Time-Multiplexed Interconnect• 1)Dynamic FPID;

– Different interconnection among the same logic module set– Each FPID is time-multiplexed FPID (“Routability Improvement Using Dynamic Interconnect Architecture,” TVLSI 199

8)

L-th crossbar

1

L

Page 15: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

15

Time-Multiplexed Interconnect• 2)Time-multiplexed interconnect from Quickturn

– (US Patent 5960191, Quickturn, 1999)– Partial crossbar is used but connected pins are time-multiplexed.– Multiple pins are multiplexed with only 1/n pins are required if n-to-1 mux i

s used.

MUX CHIP

Crossbar

A B E F A E B F

DEMUX DEMUX MUX MUX

MUX DEMUX

A B C D

MUX DEMUX

E F G

MUX DEMUX

C D A E

MUX DEMUX

G B F

MUX CHIP

Crossbar

C D G C D G

MUX MUX DEMUX DEMUX

Page 16: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

16

Time-Multiplexed Interconnect– “Mux Clock” samples signal A and signal B– “SYNC” disables sampling and synchronize sampling operation

MUX Clock

Divided Clock

SYNC for user clock

Signal A

Signal B

B A B A B A A B A B A BComposite Signal

Page 17: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

17

Time-Multiplexed Interconnect• 3)Virtual Wire

– (“Logic Emulation with Virtual Wires,” TCAD, 1997)– Several logic connections share the same physical wire.– Communication schedule is static and predicted (Analysis of

logic circuit should be done before assigning phase to each circuit partition.)

Mux

SimultaneousLogical outputs

Logical outputsLogical inputs

Logical inputsShift loops

FPGA #1 FPGA #2

phase2

phase1

Virtual wire

Page 18: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

18

Time-Multiplexed Interconnect

– Phase assignment• At the end of the phase, the produced outputs are

transferred to the other partition.

Emulation Clock

Phase 1 Phase 3 Phase 4

CLK

Enable

CommunicationEvaluationComb. logic

Comb. logic

Phase 2

Comb. logic

Page 19: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

19

Software for Multi-FPGA• Partitioning

– Various partitioning algorithms were proposed. – Placing highly interconnected circuit into a single chip is desir

able due to limited number of I/O pins.– Circuit paths that require short delay time should be inside o

ne FPGA. Routability vs. Performance trade-off.

• Placement– Assign each partitioned circuit to one of FPGA’s

Page 20: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

20

Software for Multi-FPGA• Routing

– Global routing• Select routing switches (or crossbar) or additional FPGA’s the si

gnal must pass through to get to the destination FPGA.– Detailed routing

• Assign signals to actual traces on each FPGA

• Time-multiplexed FPGA– The routing algorithm to meet the relevant precedence relatio

ns is necessary in Virtual Wire.

Page 21: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

21

Run-Time Reconfiguration• Meeting the exploding gate capacity

– With RTR, where time-multiplexed FPGA executes the whole circuits in time-domain slices, the gate capacity is greatly increased.

– Run-time reconfiguration was proposed in mid-1990’s.

• Run-time reconfiguration (RTR)– Technology to swap different configurations in the

reconfigurable hardware– The configuration in different time slots should be

assigned registers to communicate with the other configurations.

Page 22: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

22

Reconfiguration Model• Single context

– One full-chip configuration can be loaded at a time. – Sequential access for reconfiguration requires high overhead.

(Configuration of FPGA takes 5s~20s for XILINX Virtex series)

FPGA(Logic & Routing)

FPGA(Logic & Routing)

Configuration

FPGA(Logic & Routing)

FPGA(Logic & Routing)

incomingconfiguration

Page 23: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

23

Reconfiguration Model• Multi-context

– Multiple planes of configuration information– Switching between several configurations is fast. – Xilinx XC4000E, Chameleon Inc.’s CS2000 RCP

FPGA(Logic & Routing)

FPGA(Logic & Routing)

Configuration

FPGA(Logic & Routing)

FPGA(Logic & Routing)

incomingconfiguration

Page 24: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

24

Xilinx XC4000E

Each logic element incl. LUT has multiple(8) configuration planes in SRAM

micro-registers,which stores the result of each context are routed to the relevant logic block.

Page 25: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

25

Reconfiguration Model• Partially Reconfigurable

– Some part of the FPGA can reconfigured. Not entire array reconfiguration.

– Reduction of configuration data.– Programming information can be large because of address informati

on.– Xilinx 6200, Xilinx Virtex-II

FPGA(Logic & Routing)

FPGA(Logic & Routing)

Configuration

FPGA(Logic & Routing)

FPGA(Logic & Routing)

incomingconfiguration

Page 26: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

26

Reconfiguration Model• Pipeline Reconfigurable

– Partial reconfiguration occurs in increments of pipeline stages.

– Primarily used in datapath style computations.

Configure 1Configure 1

Configure 2Configure 2

Configure 3Configure 3

Execute 1Execute 1

Execute 1Execute 1

Execute 2Execute 2

Configure 4Configure 4

Configure 5Configure 5

Configure 6Configure 6

Execute 4Execute 4

Execute 4Execute 4

Execute 5Execute 5

Execute 2Execute 2

Execute 3Execute 3

Execute 3Execute 3

Time

Pipelinestages

Page 27: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

27

Configuration Scheduling

1 23

45

76

group 0

group 1

group 2

group0

group1

group2

The precedence relation shouldbe preserved when the circuit is partitioned.

timeConfiguration Lifetime of each wire

wire 4,6

wire4,7

wire2,5

wire5,6

Page 28: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

28

Fast Configuration• Problem of run-time reconfiguration

– Run-time reconfigurable systems involve reconfiguration during program execution.

– The reconfiguration time can somewhat offset the performance improvement achieved by hardware acceleration.

• Configuration time– DISC II system

• “Dynamic Instruction Set Computer’’ implemented on partially reconfigurable FPGA’s.

• M. J. Wirthlin, and B. L. Hutchings, “Sequencing run-time reconfigured hardware with software”, ACM/SIGDA International Symposium on FPGAs, pp. 122-128, 1996.

• 25%-71% of execution time is spent on reconfiguration.– UCLA ATR :

• “Automatic Target Recognition” implemented with RTR.• W. H. Mangione-smith et al., “Seeking solutions in configurable computing”, IEEE C

omputer, vol. 30, No. 12, pp. 38-43, 1997.• Over 98.5% is reconfiguration time.

Page 29: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

29

Fast Configuration• Configuration Prefetching

– Used in cosimulation environment. – Configuration time and host execution time are overlapped.

Hides the configuration time.

• Configuration Compression– Used in cosimulation environment. – The communication time between host processor and FPGA c

an be minimized by compressing configuration data. – Xilinx XC6200 : One configuration data can configure several

configuration registers.

Page 30: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

30

Fast Configuration• Configuration Caching

– The communication time to send configuration data from host to target FPGA is the main reason for slow configuration.

– Cache to save configuration data can be located near the FPGA to reduce the communication time.

Page 31: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

31

Commercial Emulators• Axis

– XtremeTM

• Simulation/Emulation/Acceleration• System with multiple PCI cards with multi-FPGA.• Interconnection between FPGAs is mesh interconnection.• RCC technology is used for simulating designs.

Page 32: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

32

Commercial Emulators– RCC (ReConfigurable Computing)

• Consists of many computing elements (Small compact processor dedicated to perform one function)

always @(posedge clk)nr_bus = inst;if(bus_active)..

always @(posedge clk)nr_bus = inst;if(bus_active)..

inv inv1(a, b);nand(a, b, c);

inv inv1(a, b);nand(a, b, c);

initialbegin

#monitor(…);$my_pile(…);

initialbegin

#monitor(…);$my_pile(…);

UltraSPARCII Workstation

ALTERAFLEX10K

ALTERAFLEX10K

ALTERAFLEX10K

ALTERAFLEX10K

ALTERAFLEX10K

ALTERAFLEX10K

ALTERAFLEX10K

ALTERAFLEX10K

PCI Interface SIMD Controller

Page 33: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

33

Emulation engineEmulation engine

Commercial Emulators– RTL language compiler

• RCC RTL compiler compiles HDL to RCC elements. • The user needs not to debug in gate-level.

RTL DesignSynthesis

Gate-level

Design

Debugger

Traditional RTL verification flow

RCC RTL compiler

RTL DesignComputing elements

Debugger

Compiler

RCC array

Page 34: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

34

Commercial Emulators– Debugging

• HotSwapping between software simulation state and RCC states.

• HotSwapping enables the user to probe the RTL constructs from the simulation time where the user wants to view.

Page 35: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

35

Commercial Emulators• Quickturn

– PalladiumTM

• Components– 1. Custom ASIC matrix to emulate circuits– 2. FPGA array– 3. Embedded logic analyzer– 4. External I/O interface

• Simulation acceleration modes– Synthesized testbench : Testbenches are synthesized into the emu

lator.– Transaction-based simulation : Decoupled simulator and accelerat

or– Accelerated cosimulation(cycle-level transaction) : Simulator and ac

celerator run in lock-step.

Page 36: 경종민 kyung@ee.kaist.ac.kr 1 Multiple-FPGA System; SoC Verification using an array of FPGA ’ s.

36

Summary• Multi-FPGA Architecture

– Mesh– Crossbar– VirtualWire, Time-multiplexed interconnect– Problems

• Nets with multiple ports may not be routable in the previous architectures.

• The usage of FPGA logic in mesh topology is small (under 30%)• Crossbar architecture needs additional hardware for interconnec

tion larger cost for emulators

• RTR Architecture