CS 584

18
CS 584

description

CS 584. Fast Fourier Transform. Used in many scientific applications Transforms a periodic signal into the frequency spectrum of the signal. FFT. Given a sequence Transform into Where. O(n 2 ). FFT. - PowerPoint PPT Presentation

Transcript of CS 584

Page 1: CS 584

CS 584

Page 2: CS 584

Fast Fourier Transform

Used in many scientific applicationsTransforms a periodic signal into the

frequency spectrum of the signal

Page 3: CS 584

FFT

Given a sequence <X[0], X[1], … X[n-1]>Transform into <Y[0], Y[1], … Y[n-1]> Where

.0,][][1

0

nikXiY kin

k

O(n2)

Page 4: CS 584

FFT

In 1965 Cooley and Tukey showed that the FFT equation could be evaluated in O(n log n) operations, resulting in:

kin

k

ikin

k

kXkXiY~

1)2/(

0

~1)2/(

0

]12[]2[][

Page 5: CS 584

FFT

Procedure RecursiveFFT(X, Y, n, ) if (n == 1) Y[0] = X[0] else RecursiveFFT(<X[0],X[2],…X[n-2]>, <Q[0],Q[1],…Q[n/2]>, n/2, 2); RecursiveFFT(<X[1],X[3],…X[n-1]>, <T[0], T[1],… T[n/2]>, n/2, 2); for i = 0 to n-1

Y[i] = Q[i mod (n/2)] + i * T[i mod (n/2)];end

Optimization Opportunity

Page 6: CS 584

FFT

0 0 0

0

0

[2] [4][0] [6] [1] [3] [5] [7]

[2] [4][0] [6] [1] [3] [5] [7]

0

[0] [4] [2] [6] [1] [5] [3] [7]

[0] [4] [2] [6] [1] [5] [3] [7]

X[0] X[1] X[2] X[3] X[4] X[5] X[6] X[7]

2

Top level

1st level of recursion

2nd level of recursion

3rd level of recursion

Return to 2nd level

Return to 1st level

Return to top levelY[2] Y[7]Y[6]Y[5]Y[4]7

Y[3] 5

Y[0] Y[1]

[0] [4] [2] [6] [1] [5] [3] [7]

4 4 4 4

6 0 64 2 4 2

4 6 1 3

Somethinglooks

familiar?

Page 7: CS 584

Parallelization of FFT

Parallelize by looking at the data patterns

Two algorithms Binary Exchange Matrix Transpose

Page 8: CS 584

Binary Exchange FFTX[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

X[8]

X[9]

X[10]

X[11]

X[12]

X[13]

X[14]

X[15]

Y[0]

Y[1]

Y[2]

Y[3]

Y[4]

Y[5]

Y[6]

Y[7]

Y[8]

Y[9]

Y[10]

Y[11]

Y[12]

Y[13]

Y[14]

Y[15]

m = 0 m = 1 m = 2 m = 3

Page 9: CS 584

Binary Exchange FFT

Data exchange takes place between all pairs of processors that differ by one bit.

One element per processor Easy

Multiple elements per processor Assign contiguous blocks to processors Same algorithm, just exchange blocks

Page 10: CS 584

Binary Exchange FFT

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

X[8]

X[9]

X[10]

X[11]

X[12]

X[13]

X[14]

X[15]

P0

P2

P3

P1

d

r

0 0 0 0

0 0 0 1

0 0 1 0

0 0 1 1

0 1 0 0

0 1 0 1

0 1 1 0

0 1 1 1

1 0 0 0

1 0 0 1

1 0 1 0

1 0 1 1

1 1 0 0

1 1 0 1

1 1 1 0

1 1 1 1

Y[0]

Y[1]

Y[2]

Y[3]

Y[4]

Y[5]

Y[6]

Y[7]

Y[8]

Y[9]

Y[10]

Y[11]

Y[12]

Y[13]

Y[14]

Y[15]

m = 0 m = 1 m = 2 m = 3

Page 11: CS 584

Binary Exchange FFT

As n increases so does communication Big bandwidth requirement

Powers of cannot be precalculated i is used at different times on different

processors Duplicated computation

Page 12: CS 584

The Transpose FFT

Assume that sqrt(n) is a power of 2The data is arranged in a sqrt(n) x

sqrt(n) two-dimensional square array

Page 13: CS 584

The Transpose FFT0 1 2 3

1512 13 14

10 118 9

4 5 6 7

0 1 2 3

1512 13 14

10 118 9

4 5 6 7

0 1 2 3

4 5 6 7

10 118 9

1512 13 14

0 1 2 3

1512 13 14

10 118 9

4 5 6 7

(b) Iteration m = 1(a) Iteration m = 0

(c) Iteration m = 2 (d) Iteration m = 3

Page 14: CS 584

Parallelization of Transpose FFT

Notice First two iterations are columnwise Last two iterations are rowwise

Rather than do an exchange Transpose the matrix halfway through

algorithm

Page 15: CS 584

The Transpose FFTP0 P1 P2 P3

P0 P1 P2 P3

0 1 2 3

1512 13 14

10 118 9

4 5 6 7

0

15

10

51

2

3

4

6

8

9

11

14

12

13

7

P0 P1 P2 P3

P0 P1 P2 P3

0 1 2 3

1512 13 14

10 118 9

4 5 6 7

0

15

10

51

2

3

4

6

8

9

7 11

14

12

13

(a) Steps in phase 1 of the transpose algorithm (before transpose)

(b) Steps in phase 3 of the transpose algorithm (after transpose)

Page 16: CS 584

The Transpose FFT

Transposition of a striped partitioned array requires all-to-all communication

Would it be less expensive to just follow through with the algorithm or do the transpose?

Page 17: CS 584

Which is better?

It Depends Architecture and amount of data play

together to create tradeoffs.Transpose algorithm is easy to

generalize to higher dimensions

Page 18: CS 584

Which is better?

1800016000140001200010000800060004000200000

5

10

15

20

25

30

35

40

45

Binary exchange2-D transpose3-D transpose

n

S