Download - Introductory Parallel Applications

Transcript

Introductory Parallel Applications

Courtesy: Dr. David Walker, Cardiff University

Example1 – wave equationCourtesy: David Walker, Cardiff University

Problem – Vibrating string of length L fixed at both ends given an initial displacement.

To determine Ψ(x, t)

displacement

x=0x=L

Page 3: Introductory Parallel Applications

Vibrating String Problem

To solve ∂2Ψ/c2∂t2 - ∂2Ψ/∂x2 = 0 Subject to Ψ(0, t) = Ψ(L, t) = 0; Ψ(x,

0) = u(x) Numerical approximation: Ψi(t+∆t) = 2Ψi(t) – Ψi(t-∆t) + tau2(Ψi-1(t) -2Ψi(t) + Ψ i+1(t)) where

tau = c ∆t/ ∆x and u(x) = sinx

Page 4: Introductory Parallel Applications

Vibrating String Problem – Sequential Problemfor(i=0; i<npoints; i++){ x = 2*pi*i/(npoints-1); x = sin(x); ps[i] = old_ps[i] = x;}

for(j=0; j<nsteps; j++){ for(i=1; i<npoints-1; i++){ new_ps[i] = 2*ps[i] – old_ps[i] + tau*tau*(ps[i-1] + 2ps[i] + ps[i+1]); }

for(i=0; i<npoints; i++){ old_ps[i] = ps[i]; ps[i] = new_ps[i]; }}

Page 5: Introductory Parallel Applications

Vibrating String – Parallel Formulation

The points are divided across processes

Communication between neighboring processors

Data structure – an array for every processor with sizes local_size+2

Page 6: Introductory Parallel Applications

Vibrating String – Parallel Code/* Set up 1D topology */ MPI_Cart_create(MPI_COMM_WORLD,1,&nprocs,&periods,

reorder,&new_comm); MPI_Cart_coords (new_comm, rank, 1, &mypos) MPI_Cart_shift (new_comm, 0, -1, &right, &left);

/* Initialise array */ local_npts = npoints/nprocs; lbound = mypos*local_npts;

for(i=0;i<local_npts;i++){ x = 2.0*PI*(double)(lbound+i)/(double)(npoints-1); x = sin(x); psi[i+1] = old_psi[i+1] = x; }

Page 7: Introductory Parallel Applications

Vibrating String – Parallel Code/* Do the updates */ istart = (mypos==0) ? 2 : 1; iend = (mypos==nprocs-1) ? local_npts-1 : local_npts; for(j=0;j<NSTEPS;j++){ MPI_Sendrecv (&psi[1], 1, MPI_DOUBLE, left, 111,

&psi[local_npts+1], 1, MPI_DOUBLE, right, 111, new_comm, &status);

MPI_Sendrecv (&psi[local_npts], 1, MPI_DOUBLE, right, 111, &psi[0], 1, MPI_DOUBLE, left, 111, new_comm, &status);

for(i=istart;i<=iend;i++){ new_psi[i] = 2.0*psi[i] - old_psi[i] + tau*tau*(psi[i-1]-

2.0*psi[i]+psi[i+1]); } for(i=1;i<=local_npts;i++){ old_psi[i] = psi[i]; psi[i] = new_psi[i]; } }

Page 8: Introductory Parallel Applications

Example 2 – Laplace equation problem

Courtesy: Dr. David Walker, Cardiff University

Electric field around a conducting object in a box

To solve ∂2Ψ/∂x2 + ∂2Ψ/∂y2 = 0 Subject to Ψ(0, y) = Ψ(L1, y) = 0;

Ψ(x, 0) = Ψ(x, L2) = 0; Ψ(x, y) =1 on S(x, y)

S(x, y)

x=0 x=L1

y=0

y=L2

Page 9: Introductory Parallel Applications

Laplace equation – Numerical approximation

We consider a square problem, L1 = L2 = L

Ψi,jk = ¼(Ψi-1,j

k-1 + Ψi+1,jk-1 + Ψi,j-1

k-1 + Ψi,j+1k-1)

Page 10: Introductory Parallel Applications

Laplace Equation – Sequential codedouble phi[NXMAX][NYMAX];double old_phi[NXMAX][NYMAX];int mask[NXMAX][NYMAX];

setup_grid(phi, nptsx, nptsy, mask);

for(k=0; k<nsteps; k++){

for(i=0; i<nptsx; i++){ for(j=0; j<nptsy; j++){ old_phi[i][j] = phi[i, j]; } }

for(i=0; i<nptsx; i++){ for(j=0; j<nptsy; j++){ if(mask[i][j]) phi[i][j] = ¼*(old_phi[i][j-1] + old_phi[i][j+1] +

old_phi[i-1][[j] + old_phi[i][[j-1]); } }

}

Page 11: Introductory Parallel Applications

Laplace Equation – Sequential codesetup_grid(phi, nptsx, nptsy, mask){

for(i=0; i<nptsx; i++){ for(j=0; j<nptsy; j++){ phi[i, j] = 0.0; mask[i, j] = 1; } }

mask[0…nptsx-1][0] = mask[0..nptsx-1][nptsy-1] = mask[0][0..nptsy-1] = mask[nptsx-1][0..nptsy-1] = 0;

mask[nptsx/2, nptsx/2-1][nptsy/2, nptsy/2-1] = 0; phi[nptsx/2, nptsx/2-1][nptsy/2, nptsy/2-1] = 1;

Page 12: Introductory Parallel Applications

Laplace Equation - Parallelization

A 2-D topology of processes are considered The updates of boundary values for a given

processor may need communication from its neighboring processors.

Following are main data structures double phi[NYMAX+2][NXMAX+2]; double oldphi[NYMAX+2][NXMAX+2]; int mask[NYMAX+2][NXMAX+2];

Page 13: Introductory Parallel Applications

LAPLACE Equation - Parallelization/* Work out number of processes in each direction of the process mesh */ dims[0] = dims[1] = 0; MPI_Dims_create (nprocs, 2, dims); npy = dims[0]; npx = dims[1];

/* Set up 2D topology */ periods[0] = periods[1] = 0; MPI_Cart_create (MPI_COMM_WORLD, 2, dims, periods, 1, &new_comm); MPI_Cart_coords (new_comm, rank, 2, coords); myposy = coords[0]; myposx = coords[1];

/* Determine neighbouring processes for communication shifts */ MPI_Cart_shift (new_comm, 0, -1, &down, &up); MPI_Cart_shift (new_comm, 1, -1, &right, &left);

/* Initialise arrays */ setup_grid (phi, npx, npy, myposx, myposy, nptsx, nptsy, &local_nptsx,

&local_nptsy, mask);

Page 14: Introductory Parallel Applications

Laplace Equation - Parallelization/* Iterate to find solution */for(k=1;k<=nsteps;k++){ for(i=1;i<=local_nptsy;i++) for(j=1;j<=local_nptsx;j++) oldphi[i][j] = phi[i][j];

MPI_Sendrecv (&oldphi[1][1], local_nptsy, MPI_DOUBLE, up, 111, &oldphi[local_nptsx+1][1], local_nptsy, MPI_DOUBLE,down, 111, new_comm, &status);

MPI_Sendrecv (&oldphi[local_nptsx][1],local_nptsy,MPI_DOUBLE,down,112, &oldphi[0][1], local_nptsy, MPI_DOUBLE, up, 112, new_comm, &status);

Page 15: Introductory Parallel Applications

Laplace Equation - Parallelizationfor(i=1;i<=local_nptsx;i++) sbuf[i-1] = oldphi[i][1]; MPI_Sendrecv (sbuf, local_nptsx, MPI_DOUBLE, left, 113, rbuf, local_nptsx,

MPI_DOUBLE, right, 113, new_comm, &status); for(j=1;j<=local_nptsy;j++) oldphi[i][local_nptsy+1] = rbuf[i-1];

for(i=1;i<=local_nptsx;i++) sbuf[i-1] = oldphi[i][local_nptsy]; MPI_Sendrecv (sbuf, local_nptsx, MPI_DOUBLE, right, 114, rbuf, local_nptsx,

MPI_DOUBLE, left, 114, new_comm, &status); for(i=1;i<=local_nptsx;i++) oldphi[i][0] = rbuf[i-1];

for(i=1;i<=local_nptsx;i++) for(j=1;j<=local_nptsy;j++) if (mask[i][j]) phi[i][j] = 0.25*(oldphi[i][j-1] + oldphi[i][j+1] + oldphi[i-1][j]

+ oldphi[i+1][j]); }

Page 16: Introductory Parallel Applications

Laplace Equation - Parallelizationsetup_grid (phi, npx, npy, myposx, myposy, nptsx, nptsy, local_ptrx,

local_ptry, mask){

local_nptsx = nptsx/npx; local_nptsy = nptsy/npy; for(j=0;j<=local_nptsy+1;j++) for(i=0;i<=local_nptsx+1;i++){ phi[i][j] = 0.0; mask[i][j] = 1;}

if (myposx == 0) mask[1][1..local_nptsy] = 0;if (myposx == npx-1) mask[local_nptsx][1..local_nptsy] = 0;if (myposy == 0) mask[1..local_nptsx][1] =0;if (myposy == npy-1) mask[1..local_nptsx][local_nptsy] = 0;

for(i=1;i<=local_nptsx;i++){ global_x = local_nptsx*myposx + i - 1; if (global_x == nptsx/2 || global_x == nptsx/2-1){ for(j=1;j<=local_nptsx;j++){ global_y = local_nptsy*myposy + j - 1; if (global_y == nptsy/2 || global_y == nptsy/2-1){ mask[i][j] = 0; phi[i][j] = 1.0; } } }}

*local_ptrx = local_nptsx; *local_ptry = local_nptsy;

Page 17: Introductory Parallel Applications

Laplace Equation – Performance Analysis

For n2 grid size, N = PxQ processor mesh, tcalc time for a flop, tshift time for communication per grid point

Tseq = T(N) =

4n2tcalc

(4n2/N)tcalc + (2n/P)tshift + (2n/Q)tshift

Page 18: Introductory Parallel Applications

A more irregular problem – Molecular Dynamics Simulation

Courtesy: Dr. David Walker, Cardiff University

In the previous problems, communication requirements are known in advance

In the current Molecular Dynamics simulation problem, the amount of data that are communicated between processors are not known in advance

The communication is slightly irregular

Page 19: Introductory Parallel Applications

The Problem A domain consisting of number of particles

(molecules) Each molecule, i is exerted a force, fij by

another molecule, j The sum of all the forces, Fi = ∑jfij makes the

particles assume a new position and velocity

Particles that are r distance apart do not influence each other

Given initial velocities and positions of particles, their movements are followed for discrete time steps

Page 20: Introductory Parallel Applications

MD - Solution

The cutoff distance, r is used to reduce the time for summation from O(n2)

Domain decomposed into cells of size rxr

Particles in one cell interact with particles in the neighbouring 8 cells and particles in the same cell

Page 21: Introductory Parallel Applications

MD - SolutionData structures:An array of all particles. Each element holds <position,

velocity>A 2D array of linked lists, one for each cell. Each element of a

linked list contains pointers to particles.struct particle{ double position[2]; double velocity[2];} Particles[MAX_PARTICLES];

struct list{ particle* part; struct list* next;}*List[MAX_CELLSX][MAX_CELLSY];

Linked ListParticles

Page 22: Introductory Parallel Applications

MD – Sequential LogicInitialize Particles and Lists;

for each time step for each particle i Let cell(k, l) hold i F[i] = 0; for each particle j in this cell and neighboring 8 cells, and are

r distance from i{ F[i]+= f[i, j]; } update particle[i].{position, velocity} due to F[i]; if new position in new cell (m,n) update Lists[k,l] and

Lists[m,n]

Page 23: Introductory Parallel Applications

MD – Parallel Version

A 2D array of processors similar to Laplace

Each processor holds a set of cells

Differences:

•A processor can communicate with the diagonal neighbors

•Amount of data communicated varies over time steps

•Receiver does not know the amount of data

Page 24: Introductory Parallel Applications

MDS – parallel solution

Steps1. Communication – Each processor communicates parameters of the particles on the boundary cells to its 8 neighboring cells

Challenges – to communicate diagonal cells

2. Update – Each processor calculates new particle velocities and positions

3. Migration – Particles may migrate to cells in other processors

Other challenges:

1. Appropriate packing of data.

2. Particles may have to go through several hops during migration

Assumptions:

1. For simplicity, let us assume that particles are transported to only neighboring cells during migration

Page 25: Introductory Parallel Applications

MDS – parallel solution – 1st step

Communication of boundary dataA

a a aaaa

a a a

aaa

b b bbbb

b b b

bbb

c c cccc

c c c

ccc

d d dddd

d d d

ddd

Page 26: Introductory Parallel Applications

MDS – parallel solution – 1st step

Communication of boundary dataA

a a aaaa

a a a

aaa

b b bbbb

b b b

bbb

c c cccc

c c c

ccc

d d dddd

d d d

ddd

a a aaaa

a a a

aaa

C Cc c c

c c c

c c

b b b

d d d

A Aa a a B Bb b b

D Dd d d

Page 27: Introductory Parallel Applications

MDS – parallel solution – 1st step

Communication of boundary dataA

a a aaaa

a a a

aaa

b b bbbb

b b b

bbb

c c cccc

c c c

ccc

d d dddd

d d d

ddd

a a aaaa

a a a

aaa

C Cc c c

c c c

c c

b b b

d d d

A Aa a a B Bb b b

D Dd d d

D C

B A

Can be achieved by ?

Shift left, shift right, shift up, shift down

Page 28: Introductory Parallel Applications

MDS – parallel solution – 1st stepLeft shiftnsend = 0;for(i=0; i<local_cellsx; i++){ for each particle p in cell (i, 1){ pack position of p in sbuf nsend += 2 }}

MPI_Sendrecv(sbuf, nsend, …, left,..rbuf, max_particles*2, …, right, &status);

MPI_Getcount(status, MPI_DOUBLE, &nrecv);particles = nrecv/2;for(i=0; i<particles; i++){ read (x,y) from next 2 positions in rbuf; add (x,y) to particles[local_particle_count+i]; determine cell k, l for the particle Add it to list (k, l);}

Page 29: Introductory Parallel Applications

MDS – parallel solution – 2nd step

Update: Similar to sequential program. A processor has all the required information

for calculating Fi for all its particles Thus new position and velocity determined. If new position belongs to the same cell in

the same processor, do nothing If new position belongs to the different cell

in the same processor, update link lists for old and new cells.

Page 30: Introductory Parallel Applications

MDS – parallel solution – 3rd step If new position belongs to the different cell in a

different processor – particle migrationfor each particle p update {position, velocity} determine new cell if new cell # old cell delete p from list of old cell if(different processor) pack p into appropriate communication buffer remove p from particle array

Shift leftShift rightShift upShift down

Page 31: Introductory Parallel Applications

MDS – parallel solution – 3rd step This shifting is a bit different from the previous shifting A processor may just act as a transit point for a particle Hence particles have to be packed with careShift left:MPI_Sendrecv(leftbuf, nsend_left, …, left

rbuf, max_size*4, .., right, &status);MPI_Getcount(status, MPI_DOUBLE, &nrecv);particles = nrecv/4;

for(i=0; i<particles; i++){ read next 4 numbers in {x, y vx, vy} if(particle in this process) add particle to particle array determine cell add particle to list for the cell else put data in the appropriate comm. buffer for the next up or down

shifts}

Page 32: Introductory Parallel Applications

MDS – comments Generic solution

A particle can move to any cell Force can be from any distance

Load balancing

Page 33: Introductory Parallel Applications

A Dynamical System- WaTor Courtesy: Dr. David Walker, Cardiff

A 2-D ocean in which sharks and fish survive

2 important featuresa. Need for dynamic load distributionb. Potential conflicts due to updates by

different processors Features shared by other advanced

parallel applications

WaTor – The problem

Ocean divided into grids Each grid cell can be empty or have a

fish or a shark Grid initially populated with fishes and

sharks in a random manner Population evolves over discrete time

steps according to certain rules

Page 35: Introductory Parallel Applications

WaTor - Rules

Fish: At each time step, a fish tries to move

to a neighboring empty cell. If not empty, it stays

If a fish reaches a breeding age, when it moves, it breeds, leaving behind a fish of age 0. Fish cannot breed if it doesn’t move.

Fish never starves

Page 36: Introductory Parallel Applications

WaTor - RulesShark: At each time step, if one of the neighboring

cells has a fish, the shark moves to that cell eating the fish. If not and if one of the neighboring cells is empty, the shark moves there. Otherwise, it stays.

If a shark reaches a breeding age, when it moves, it breeds, leaving behind a shark of age 0. shark cannot breed if it doesn’t move.

Sharks eat only fish. If a shark reaches a startvation age (time steps since last eaten), it dies.

Page 37: Introductory Parallel Applications

Inputs and Data StructuresInputs: Size of the grid Distribution of sharks and fishes Shark and fish breeding ages Shark starvation ageData structures:A 2-D grid of cells struct ocean{ int type /* shark or fish or empty */ struct swimmer* occupier; }ocean[MAXX][MAXY]A linked list of swimmers struct swimmer{ int type; int x,y; int age; int last_ate; int iteration; swimmer* prev; swimmer* next; } *List;

Sequential Code Logic

•Initialize ocean array and swimmers list

•In each time step, go through the swimmers in the order in which they are stored and perform updates