A Monte Carlo Simulation Accelerator using FPGA Devices Final Year project : LHW0304 Ng Kin Fung &&...
-
Upload
marion-summers -
Category
Documents
-
view
216 -
download
0
Transcript of A Monte Carlo Simulation Accelerator using FPGA Devices Final Year project : LHW0304 Ng Kin Fung &&...
A Monte Carlo Simulation A Monte Carlo Simulation Accelerator using FPGA Accelerator using FPGA
DevicesDevices
Final Year project : LHW0304Final Year project : LHW0304
Ng Kin Fung && Ng Kwok TungNg Kin Fung && Ng Kwok Tung
Supervisor : Professor LEONG, Heng Wai PhilipSupervisor : Professor LEONG, Heng Wai Philip
OverviewOverview
OverviewOverview
ObjectiveObjectiveBackgroundBackground
Software-only ImplementationSoftware-only ImplementationHardware ImplementationHardware Implementation
FPGAFPGA
Soft-Core Micro-ProcessorSoft-Core Micro-Processor
OverviewOverview
BackgroundBackground Interest Rate ModelingInterest Rate ModelingBrace-Gatarek-Musiela (BGM) ModelBrace-Gatarek-Musiela (BGM) Model
Motivation and ContributionMotivation and ContributionSystem DesignSystem Design
System Design OverviewSystem Design OverviewSystem ComponentsSystem ComponentsSystem OperationsSystem Operations
OverviewOverview
Experiment and ResultExperiment and ResultResourcesResourcesPerformancePerformanceData Transmission OverheadData Transmission Overhead
ConclusionConclusionFuture ImprovementFuture ImprovementQ & A Section Q & A Section
ObjectiveObjective
ObjectiveObjective
What we achieved in last semesterWhat we achieved in last semester Study and get familiar with the development related toStudy and get familiar with the development related to
olsols Implement some simple examples to get experience iImplement some simple examples to get experience i
n system development of FPGA with Soft-core Micro-n system development of FPGA with Soft-core Micro-processorprocessor
First ever successful port of the Microblaze system to the Celoxica RC200 development board
Study the performance and power consumption of the Study the performance and power consumption of the systemsystem
ObjectiveObjective
How about this semesterHow about this semesterBuild up a Build up a Monte Carlo Simulation Accelerator Monte Carlo Simulation Accelerator
using FPGA technology and Soft-core Micro-using FPGA technology and Soft-core Micro-processorprocessor
Study the speed up and performanceStudy the speed up and performanceStudy the transmission overhead of the Study the transmission overhead of the
transmission channel between user core and transmission channel between user core and Soft-core Micro-processorSoft-core Micro-processor
FPGA and Soft-Core FPGA and Soft-Core Micro-ProcessorMicro-Processor
Software only implementationSoftware only implementation
TheThe performance isis NOT satisfactorySequential execution of instruction instead of Sequential execution of instruction instead of
parallel executionparallel executionSlow Memory accessSlow Memory access
Lack of ability to customize hardwareNo way to save power by switching off No way to save power by switching off
hardware modulehardware moduleThere is a need to solve the problem in There is a need to solve the problem in
another approachanother approach
FPGA TechnologyFPGA Technology
More and more More and more popular in system design in system design Higher degree of parallelism
Fewer clock cycle requiredFewer clock cycle required
FPGA TechnologyFPGA Technology Explicitly hardwired to perform a certain operatioExplicitly hardwired to perform a certain operatio
nn Optimized for specific purpose higher performan higher performan
ce ce Enable Enable customization of hardware module module
Power Saving Power Saving Reconfigurable
Enable reuse of hardwareEnable reuse of hardware Able to simulate and synthesize the circuits from Able to simulate and synthesize the circuits from
a high level program-like description a high level program-like description Easy system development and system testing system development and system testing Shorter time to market higher profit higher profit
Soft-Core Micro-ProcessorSoft-Core Micro-Processor
Most systems use a Most systems use a PC+FPGA accessed accessed through a through a PCI bus Bottleneck for entire system for entire system
Use of Use of Soft-Core Micro-ProcessorEverything is implemented in FPGAEverything is implemented in FPGATransmission of data is within the FPGATransmission of data is within the FPGAA A higher transmission bandwidth and and
lower latency
Soft-Core Micro-ProcessorSoft-Core Micro-Processor
Other advantagesOther advantagesEasier to developRetain the advantage of using FPGA Retain the advantage of using FPGA
FlexibleRetargetable
ConclusionConclusionFPGA technology + Soft-Core Micro-Proce
ssor
Interest Rate Interest Rate ModelingModeling
Interest Rate ModelingInterest Rate Modeling
Important of interest rate modelingImportant of interest rate modelingSimulate market behavior with historical Simulate market behavior with historical
parameter valuesparameter valuesExplain interest rate movements in terms of Explain interest rate movements in terms of
an underlying model an underlying model decision making on economic policy risk management
Brace-Gatarek-Musiela (BGM) MoBrace-Gatarek-Musiela (BGM) Modeldel
One of the most popular interest rate One of the most popular interest rate modelsmodels
Base on Monte Carlo MethodBase on Monte Carlo MethodLooping Part (Looping Part (most computational
expensive))
Implementing BGM Model using FPGA Implementing BGM Model using FPGA and Soft Core Microprocessorand Soft Core Microprocessor
0
20
40
60
80
100
120
pt1 pt2 pt3 pt4 pt5 pt6 pt7 pt8 pt9
BGM core generate 50 paths with 9 fixed points
Implementing BGM Model using FPGA Implementing BGM Model using FPGA and Soft Core Microprocessorand Soft Core Microprocessor
Implemented by FPGA in parallel styleImplemented by FPGA in parallel stylePost-processing calculation by MicroblazePost-processing calculation by Microblaze
Average and Standard errorAverage and Standard errorFast Simplex Link Bus for data transmissiFast Simplex Link Bus for data transmissi
on between BGM core and Microblazeon between BGM core and Microblaze
ContributionContribution
ContributionContribution
Improve the performance of the systemImprove the performance of the system
ImplementationImplementation ResponsibilityResponsibility PerformancePerformance
Software-onlySoftware-only On MarketOn Market LowestLowest
FPGA + PCFPGA + PC CSE ResearchCSE Research HighHigh
FPGA + Soft-Core
Micro-ProcessorOur Task Highest
System DesignSystem Design
System Design OverviewSystem Design Overview
System ComponentSystem Component
MicroblazeMicroblaze A soft-core MicroprocessorA soft-core Microprocessor
Delivered as HDL source code for synthesis Delivered as HDL source code for synthesis Designed in VHDLDesigned in VHDL Specially optimized for Xilinx FPGAs A reduced instruction set computer (RISC) A reduced instruction set computer (RISC) Speed of Microblaze across different devices from Xilinx StatisticsSpeed of Microblaze across different devices from Xilinx Statistics
Virtex™ -II Pro (-6) Virtex™ -II Pro (-6) 150 MHz 150 MHz 101 D-MIPS 101 D-MIPS
Virtex-II (-5) Virtex-II (-5) 125 MHz 125 MHz 82 D-MIPS82 D-MIPS
Virtex-E (-7) Virtex-E (-7) 75 MHz 75 MHz 49 D-MIPS49 D-MIPS
Spartan-IIE (-6) Spartan-IIE (-6) 75MHz 75MHz 49 D-MIPS49 D-MIPS
Spartan™ -II (-4) Spartan™ -II (-4) 65 MHz 65 MHz 43 D-MIPS43 D-MIPS
User Core – BGMUser Core – BGM Connect the core designed in VHDL to the Microblaze systConnect the core designed in VHDL to the Microblaze syst
emem Solve most computational expensive task in fully h
ardware Need to follow the signal and timing of the bus connected Need to follow the signal and timing of the bus connected A microprocessor description (MPD) fileA microprocessor description (MPD) file
Defines the interface of the peripheral Defines the interface of the peripheral Ports, BusesPorts, Buses
A Peripheral Analyze Order (PAO) fileA Peripheral Analyze Order (PAO) file A list of HDL files in order of compilation that are needA list of HDL files in order of compilation that are need
ed for synthesised for synthesis
Fast Simplex Link (FSL)Fast Simplex Link (FSL)
32 bits wide bus32 bits wide bus Unidirectional point-to-point data streaming Unidirectional point-to-point data streaming
interfacesinterfaces Control and Data communication supportControl and Data communication support FIFO based communicationFIFO based communication Fast Internal data and control transmission
Peak bandwidth 300MB / SEC
Fast Simplex Link (FSL)Fast Simplex Link (FSL)
Fast Simplex Link (FSL)Fast Simplex Link (FSL)
Xilinx Fast Simplex Link Channel Product Specification DS449 (v1.1) Aug 06, 2003
Fast Simplex Link (FSL)Fast Simplex Link (FSL)
Xilinx Fast Simplex Link Channel Product Specification DS449 (v1.1) Aug 06, 2003
Use Read Marco microblaze_bread_datafsl(val, id) for reading data from FSL FIFO to Microblaze
On-Chip Memory, Local Memory On-Chip Memory, Local Memory Bus and Memory Bus ControllerBus and Memory Bus Controller On Chip MemoryOn Chip Memory
Storage medium for the data and instructionStorage medium for the data and instruction Minimize the transmission overhead between the between the
Microblaze and the memoryMicroblaze and the memory Local Memory BusLocal Memory Bus
Single-cycle access to on-chip dual-port block RAM to on-chip dual-port block RAM Performance of 125 MHz
LMB BRAM Interface ControllerLMB BRAM Interface Controller Interface between the LMB and the bram_block peripInterface between the LMB and the bram_block perip
heralheral Separate controller for data and control Separate controller for data and control
On-Chip Peripheral Bus On-Chip Peripheral Bus (OPB Bus)(OPB Bus)
Connection between the main system and Connection between the main system and the peripheralsthe peripheralsMake Microblaze System Make Microblaze System More Functional
In this projectIn this projectUARTUARTOPB TimerOPB TimerGPIO GPIO
Universal Asynchronous Universal Asynchronous Receiver-Transmitter (UART)Receiver-Transmitter (UART)
Handles asynchronous serial communicatiHandles asynchronous serial communicationon
Libgen allows the mapping of standard inpLibgen allows the mapping of standard input and outputut and outputUse of scanf and printf for the Use of scanf and printf for the communicatio
n with user
OPB TimerOPB Timer
Facilitate the Facilitate the correct measurement of the performance
Initiate timer Initiate timer Start timer Start timer Stop timer Stop timer Get timer value Get timer valueXStatus XTmrCtr_InitializeXStatus XTmrCtr_Initializevoid XTmrCtr_Startvoid XTmrCtr_Startvoid XTmrCtr_Stop void XTmrCtr_Stop Xuint32 XTmrCtr_GetValue Xuint32 XTmrCtr_GetValue
General Purpose Input Output General Purpose Input Output (GPIO)(GPIO)
Problem found on FSL BusReset signal connected to GoundReset signal connected to GoundNo way to reset the BGM core through FSL B to reset the BGM core through FSL B
ususSolutionSolution
Make change to the VHDL source codeMake change to the VHDL source codeUse GPIO
General Purpose Input Output General Purpose Input Output (GPIO)(GPIO)
Reset
BGM CoreMicroblaze FSL
Reset Reset
X
Reset by GPIO
Reset by FSL BGM CoreMicroblaze GPIO
Reset Reset
System System OperationsOperations
BGM Core is reset
Microblaze System Start
Timer is started
BGM Process
Any More Data
Post-Processing Calculation by Microblaze
Timer is stopped
Result is printed out
End of Microblaze System
yes
No
System System OperationsOperations
BGM Core in process of generating path
BGM Process Start
Data transfer from BGM core to Microblaze System
Data format transform
Temperate storage of data
End of Microblaze System
Experimental Experimental ResultsResults
ResourcesResources Selected Device : 2v1000fg456-4 Resources for BGM core aloneSelected Device : 2v1000fg456-4 Resources for BGM core alone
DeviceDevice Used numberUsed number Total NumberTotal Number PercentagePercentage
SlicesSlices 64556455 51205120 126%
Slice Flip FlopsSlice Flip Flops 57685768 1024010240 56%56%
4-input LUTs4-input LUTs 1097410974 1024010240 107%
Bonded IOBsBonded IOBs 4242 324324 12%12%
MULT18X18sMULT18X18s 3737 4040 92%92%
GCLKsGCLKs 33 1616 18%18%
DCMsDCMs 11 88 12%12%
Unable to place whole system to the FPGA boardSystem Simulation by ModelSim
PerformancePerformance
Comparison of performance for the running of BGM core in FPGA and in PC
(By Dr. Zhang)Speed up factor : 19.87
PerformancePerformance
The comparison of performance for the running the BGM core in FPGA and PC with different number of paths generated
(By Dr. Zhang)Stable Performance with different path numbers
PerformancePerformance
Simulation of Microblaze system Total time required for generating 50 paths : 2.871ms
Speed up factor : 21.94
Transmission BandwidthTransmission Bandwidth
Transmission MediaTransmission Media Peak Transmission BandwidthPeak Transmission Bandwidth
Serial PortSerial Port 15KB / SEC15KB / SEC
Parallel PortParallel Port 150KB / SEC150KB / SEC
10M Ethernet10M Ethernet 1.2MB / SEC1.2MB / SEC
USBUSB 1.5MB / SEC1.5MB / SEC
100M Ethernet100M Ethernet 12MB / SEC12MB / SEC
PCI BusPCI Bus 100MB / SEC100MB / SEC
FSL BusFSL Bus 300MB / SEC
Transmission BandwidthTransmission Bandwidth
In FSL Bus 32 bit of data is sent by about 40000psTransmission bandwidth is around 100MB per second
Same significant as the peak transmission bandwidth as stated in specification
ConclusionConclusion
A A Monte Carlo Simulation Accelerator was implemented usinMonte Carlo Simulation Accelerator was implemented using FPGA technology and Xilinx Microblaze Soft-core Micro-prg FPGA technology and Xilinx Microblaze Soft-core Micro-processorocessor
A A speed up factor 21.94 when compared with software when compared with software only implementationonly implementation
Higher bandwidth and lower latency can be achieved can be achieved using FSL Link between Microblaze and BGM coreusing FSL Link between Microblaze and BGM core
High performance, the parallelism of execution of instruction, the reconfigurability and reuseability and the short development time……
Future DevelopmentFuture Development
Put the whole system in the FPGA boardPut the whole system in the FPGA board Implement other applications which put Implement other applications which put
high performance and short developing high performance and short developing time as the major considerationtime as the major consideration
Study other IP core included and make Study other IP core included and make improvement to the systemimprovement to the system
Q & AQ & A