Performed by: Liran Sperling 200476216 Gal Braun301357059 Instructor: Evgeny Fiksman

Performed by: Liran Sperling 200476216Gal Braun 301357059

Instructor: Evgeny Fiksman

High speed digital systems laboratoryהמעבדה למערכות ספרתיות מהירות

הטכניון - מכון טכנולוגי לישראל

הפקולטה להנדסת חשמל

Technion - Israel institute of technologydepartment of Electrical Engineering

GPS/INS Tracking SystemUsing Particle Filter Model

1

Project Requirements

3


Implement the INS/GPS system using particle filter model algorithm using Nvidia GPU and OpenCL

platform

GPGPU Languages

6


• DirectX (1995)

• Cuda (2006)

• OCL (2008)

• Open Computing Language

• Managed by Khronos group

• Use all computation resources in system

• Derived from C99

Khronos Group

6



Our GPU – GeForce GTX285

Price: 200$ – 250$

GPU Architecture

6


OCL Memory model

6



From C to OpenCL

System Description

4


Combined GPS/INS navigation based on the particles filter model

INS – Inertial Navigation System GPS – Global Positioning System Particles Filter Model – Combining the

GPS and INS algorithms.

System Description

4


INS

GPSWeightsrecalculation

Resampling

every 10 msec

every 1sec

every 1sec

Neff > Nth

Neff < Nth

CPU

Block Diagram

5


Initialization

Particle Propagation

Particle weight calc

State vector calculation

Covariance matrix calc

Resampling

Regularization

No GPS measure

GPS measure

Particles number < Nth

Particles number > Nth

Weight recomputation

Randomizing

Effective particles number calc

Matrix Inversion

Many Particles (30000). Each particle is independent to the

others. Major parts of the algorithms can be

performed in parallel. OpenCL parallel computing will enable

to use the GPGPU computing power to optimize the algorithm performance.


Why GPU and OCL?

We would like to use our multi-core GPGPU

to optimize the calculation time.How can it be done ?


Calculation Optimization - Example


Example - CPU

Calculation time = 0 time units

12 5 2 13 16 8 5 41 17 27 3 9 20 34 7 12

Sum = 012

1

17

2

19

3

32

4

…

219

15

231

16


Example – Naive GPU

12 5 2 13 16 8 5 41 17 27 3 9 20 34 7 12

Sum = 529

Calculation time = 0 time units

0 529

1

Is the output correct ?

Can this level really be achieved cosidering memory access & sychronizing ?


But…


Example – Our GPU Solution

12 5 2 13 16 8 5 41 17 27 3 9 20 34 7 12

Calculation time = 0 time units 1

0 0 0 0 0 0 0 029 32 5 22 36 42 12 530 0 0 065 74 17 75

2

0 082 149

3

Sum = 0 231

4

Takes approximately . Utilizes the GPUPU multi-core to the

maximum. And with large calculations (say, 30000

* 16), the calculation time is extremely reduced compared to the linear CPU solution.


Our Solution

2log N


Development Methodology

Final simulatio

n

Blocks layer

Function layer

Function layer: Transforming Matlab functions into OpenCL

kernels & code.– Major difficulty was tranforming the Matlab “natural”

matrix calculations into parallel C code and kernels.

Comparing the outputs of the matlab and OpenCL functions.



Block layer: The algorithm was divided into two major

parts: Particles propagation block & weights recalculation. Resampling block.

– Major difficulty was synchronizing the code parts done by the CPU (the parts written in C code) and the parts done by the GPU (the parts written as OpenCL kernels).

Comparing the outputs of the matlab and OpenCL.

– Major difficulty was finding errors since it meant checking iteration by iteration of the algorithm running.



Final Simulation: Running the entire algorithm over the 100

seconds long data file we received.

Comparing the results to the results provided by the C programming group.




Results – X coordinate


Results – Y coordinate


Results – Z coordinate


Results – X velocity


Results – Y velocity


Results – Z velocity


Results – Running Time%GPU calculating time:


Results – Running Time%GPU calculating time (excluding prediction):


Results – Running TimeNumber of calls to function:


Results – Running TimeNumber of calls to function (excluding Prediction & reads from memory):


Results – Running TimeFunction run time:


Results – Running TimeCycle run time (presenting the 3 types of possible cycles):

Block Diagram


CPU

No GPS measure

GPS measure

Particles number < Nth

Particles number > Nth

Randomizing

Matrix Inversion

5

Initialization

Particle Propagation

Particle weight calc

State vector calculation

Covariance matrix calc

Resampling

Regularization

Weight recomputation

Effective particles number calc

5

0.58[ msec]

1.80[ msec]

10.44[ msec]

12.24[ msec]

3


?שאלות

Performed by: Liran Sperling 200476216 Gal Braun301357059 Instructor: Evgeny Fiksman

Documents

Transcript of Performed by: Liran Sperling 200476216 Gal Braun301357059 Instructor: Evgeny Fiksman