Prewhitening - hosting.astro.cornell.eduhosting.astro.cornell.edu/~cordes/A6523/Prewhitening.pdf ·...

20
Prewhitening What is Prewhitening? Prewhitening is an operation that processes a time series (or some other data sequence) to make it behave statistically like white noise. The ‘pre’ means that whitening precedes some other analysis that likely works better if the additive noise is white. These operations can be viewed in either the time domain or the frequency domain: 1. Make the ACF of the time series appear more like a delta function. 2. Make the spectrum appear flat. Example data sets that may require prewhitening: 1. A well behaved noise process with an additive low frequency (or polynomial) trend added to it. 2. A deterministic signal with an additive red-noise process. Viewed in the frequency domain, prewhitening means that the dynamic range of the measured data is reduced. 1

Transcript of Prewhitening - hosting.astro.cornell.eduhosting.astro.cornell.edu/~cordes/A6523/Prewhitening.pdf ·...

Prewhitening

What is Prewhitening? Prewhitening is an operation that processes a time series (or some otherdata sequence) to make it behave statistically like white noise. The ‘pre’ means that whiteningprecedes some other analysis that likely works better if the additive noise is white.

These operations can be viewed in either the time domain or the frequency domain:

1. Make the ACF of the time series appear more like a delta function.

2. Make the spectrum appear flat.

Example data sets that may require prewhitening:

1. A well behaved noise process with an additive low frequency (or polynomial) trend added to it.

2. A deterministic signal with an additive red-noise process.

Viewed in the frequency domain, prewhitening means that the dynamic range of the measured datais reduced.

1

Why bother? Recall from our discussions of spectral analysis the issues of leakage and bias. Thesearise from sidelobes inherent to spectral estimation. We can minimize leakage in two ways: (1)make sidelobes smaller and (2) minimize the power that is prone to leaking into sidelobes. Spectralwindows address the former while prewhitening mitigates the latter. Leakage into sidelobes alsoconstitutes bias in spectral estimates. However bias appears in other data analysis procedures.Consider least-squares fitting of a sinusoid to a signal of the form

x(t) = A cos(ωt + φ) + r(t) + n(t),

where n(t) is WSS white noise and r(t) is red noise with a steep power spectrum. Red noisecan strongly bias fitting of a model x(t) = A cos(ωt + φ) because its power can leak across theunderlying spectrum causing a least-squares fit to give highly discrepant values of A, ω, and φ.

Prewhitening of the time series ideally would yield a transformed time series of the form

x′(t) = A′ cos(ωt + φ) + n′(t)

to which fitting a sinusoidal model will be less biased.

2

Procedures:

We have already seen one analysis that is related to prewhitening: the matched filter (MF). TheMF doesn’t whiten the spectrum of the output but it does weight the frequency components of themeasured quantity to maximize the S/N of the signal.

The signal model in this case is x(t) = a0A(t) + n(t). Recall for an arbitrary spectrum Sn(f ) foradditive noise that the frequency-domain MF for a signal A(t) is

h(f ) ∝ A(f )

Sn(f ).

Taking equality for simplicity, when the filter is applied to the measurements x(t), we have

y(f ) = x(f )h∗(f ) ∝ a0|A(f )|2Sn(f )

+n(f )A∗(f )

Sn(f ).

This means that the ensemble-average spectrum of the filter output is

⟨|y(f)|2

⟩=

a20|A(f)|4S2n(f)

+〈|n(f)|2〉|A(f)|2

S2n(f)

=a20|A(f)|4S2n(f)

+Sn(f)|A(f)|2

S2n(f)

=a20|A(f)|4S2n(f)

+|A(f)|2Sn(f)

=|A(f)|2Sn(f)

[a20|A(f)|2Sn(f)

+ 1

]3

Signals with trends: A common situation is where a quantity of the form a0A(t) + n(t) is super-posed with a strong trend, such as a baseline variation. Similar issues arise in measurements ofspectra.

Consequences of trends include:

1. Bias in estimating parameters of A(t− t0) or its spectral analog A(ν − ν0).2. Erroneous estimates of cross correlations between two time series such as

x(t) = s1(t) + n1(t) and y(t) = s2(t) + n2(t),

where s1,2 are signals of interest and n1,2 are measurement errors. I.e. we may be interested inthe correlation

C =1

Nt

∑t

s1(t)s2(t) or C =1

Nt

∑t

[s1(t)− s1][s2(t)− s2]

where s1,2 = (1/Nt)∑

t s1,2(t) are the sample means.

If there are trends p1,2(t) added to x(t) and y(t) the correlation C of x and y used to estimate C maybe dominated completely by the trends and not the signal parts of the measurements.

A fix: Trends can often be modeled as a polynomial of some order that can be fitted to the mea-surements. The order of the polynomial needs to be chosen ‘wisely.’ For a pulse or spectral lineconfined to some range of t or ν this is straight forward. But for a detection problem where thesignal location is not known, the situation is very tricky.

4

Prewhitening filter: Consider again x(t) = a0A(t) + n(t) and let’s trivially construct a frequency-domain filter that whitens the measurements.

We want a filter h(t) that flattens the noise n(t) in the frequency domain. Let y(t) = x(t) ⊗ h(t)

where ⊗ means convolution. All we need is h(f ) =√Sn(f ). Then the ensemble spectrum of the

output y(f ) is

〈|y(f )|2〉 = 〈|x(f )|2〉〈|h(f )|2〉

=〈|x(f )|2〉Sn(f )

=a20〈|A(f )|2〉Sn(f )

+ 1

Note how this differs from the result for a matched filter. But the result is that in the mean thespectrum of the additive noise has been flattened.

Prewhitening is important in both detection and estimation applications.

5

Prewhitening in the least-squares estimation context:

Consider our standard linear modely = Xθ + n,

which has a least-squares solution for the parameter vector

θ =(X†C−1n X

)−1X†C−1n y,

where the covariance matrix of the noise vector n is

Cn = 〈nn†〉.This is also the maximum likelihood solution in the right circumstances (which are?).

As with any covariance matrix, Cn is Hermitian and positive, semi-definite. This means that thequadratic form for an arbitrary vector z satisfies

z†Cnz ≥ 0.

Such matrices can always be factored according to the Cholesky decomposition:

Cn = LL†

where L is a lower-diagonal matrix; e.g.

L =

a 0 0 0

b c 0 0

d e f 0

g h i j

.

6

Utility: we can transform the model as follows using L:

y = Lyw

X = LXw.

Substituting into the solution vector for θ and using

y† = (Lyw)† = y†wL†, X† = (LXw)† = X†wL

†, and C−1n = (LL†)−1 = L†−1L−1

yields

θ =(X†C−1n X

)−1X†C−1n y

= (X†w L†C−1n L︸ ︷︷ ︸≡I

Xw)−1X†w L†C−1n L︸ ︷︷ ︸≡I

y

=(X†wXw

)−1X†wy.

So what? The solution is identical to the least-squares case where the noise covariance matrix isdiagonal; i.e. the noise vector nw = L−1n has been transformed to white noise. We have whitenedthe data.

When is this useful? An example is the fitting of a sinusoidal function amid red noise whereleakage effects are important just as they are for spectral analysis. A specific example is the fittingof astrometric parameters or periodicities in radial velocity data.

What’s the catch? You need to know the covariance matrix of the noise n to do the Choleskydecomposition. This can be easier said than done!

7

Examples of sine wave + red and white noiseExamples were generated with a signal

y(t) = cos(2πt/P + φ) + r(t)/snrr + w(t)/snrw

where r, w have unit variance and are scaled by the signal to noise ratios snrr and snrw, respectively.

The covariance matrix for the combined noise n = r + w was calculated by averaging Cn = 〈nn†〉over 1000 realizations.

Note that for some real situations where we have only a single time series, we would need tocalculate Cn differently, e.g. from first principles, prior knowledge, etc.

In practice, realizations of r were generated and the mean subtracted. Then white noise was addedto form n and then the Cholesky decomposition was done using the command

L = scipy.linalg.cholesky(Cn, lower=True)

For data vectors of length N , the lower-diagonal matrix L is N × N . If the mean had been sub-tracted from the white noise as well, the rank of the covariance matrix would be N − 1 and thedecomposition would fail.

Results in the following figures indicate that

1. Power-law red noise with spectral indices si <∼ 2 do not benefit particularly from whiteningbecause leakage is much less.

8

2. What matters is the signal to noise ratio of the cosine to the signal contained in one resolutionbandwidth ∆f ∼ T−1 centered on the frequency of the sinusoid. For a steep power law, only asmall fraction of the total power in the red noise is in this band whereas the flatter the spectrum,the larger this fraction is.

9

0 50 100 150 200 250−200

−150

−100

−50

0

50

100

150

200

Sig

nal+

Noi

se

Time Series

100 101 10210−6

10−5

10−4

10−3

10−2

10−1

100

101

102

103Spectra

0 50 100 150 200 250Time (bins)

−200

−150

−100

−50

0

50

100

150

200

Noi

seon

ly

100 101 102

Frequency (bins)

10−6

10−5

10−4

10−3

10−2

10−1

100

101

102

103

Cholesky whitening: N =256 Sine+RN+WN Si = 1.0 S/Nr = 0.01 S/Nw = 1.00

Figure 1: Example of whitening using the Cholesky decomposition. The signal consists of a sine wave with period of 10.23 time bins withadditive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noise are given. Top left: original time series (red)and whitened time series (black). Bottom left: original noise (red) and whitened noise (black). Top right: power spectra of the original andwhitened time series. Bottom right: power spectra of original and whitened noise sequences.

10

0 50 100 150 200 250−6

−4

−2

0

2

4

6

Sig

nal+

Noi

se

Time Series

100 101 10210−5

10−4

10−3

10−2

10−1

100Spectra

0 50 100 150 200 250Time (bins)

−4

−3

−2

−1

0

1

2

3

4

Noi

seon

ly

100 101 102

Frequency (bins)

10−6

10−5

10−4

10−3

10−2

10−1

Cholesky whitening: N =256 Sine+RN+WN Si = 1.0 S/Nr = 0.50 S/Nw = 1.00

Figure 2: Example of whitening using the Cholesky decomposition. The signal consists of a sine wave with period of 10.23 time bins withadditive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noise are given. Top left: original time series (red)and whitened time series (black). Bottom left: original noise (red) and whitened noise (black). Top right: power spectra of the original andwhitened time series. Bottom right: power spectra of original and whitened noise sequences.

11

0 50 100 150 200 250−60

−40

−20

0

20

40

60

80

100

Sig

nal+

Noi

se

Time Series

100 101 10210−5

10−4

10−3

10−2

10−1

100

101

102

103Spectra

0 50 100 150 200 250Time (bins)

−60

−40

−20

0

20

40

60

80

100

Noi

seon

ly

100 101 102

Frequency (bins)

10−5

10−4

10−3

10−2

10−1

100

101

102

103

Cholesky whitening: N =256 Sine+RN+WN Si = 2.0 S/Nr = 0.01 S/Nw = 1.00

Figure 3: Example of whitening using the Cholesky decomposition. The signal consists of a sine wave with period of 10.23 time bins withadditive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noise are given. Top left: original time series (red)and whitened time series (black). Bottom left: original noise (red) and whitened noise (black). Top right: power spectra of the original andwhitened time series. Bottom right: power spectra of original and whitened noise sequences.

12

0 50 100 150 200 250−20

−15

−10

−5

0

5

10

15

Sig

nal+

Noi

se

Time Series

100 101 10210−6

10−5

10−4

10−3

10−2

10−1

100

101

102Spectra

0 50 100 150 200 250Time (bins)

−20

−15

−10

−5

0

5

10

15

Noi

seon

ly

100 101 102

Frequency (bins)

10−5

10−4

10−3

10−2

10−1

100

101

102

Cholesky whitening: N =256 Sine+RN+WN Si = 2.0 S/Nr = 0.10 S/Nw = 1.00

Figure 4: Example of whitening using the Cholesky decomposition. The signal consists of a sine wave with period of 10.23 time bins withadditive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noise are given. Top left: original time series (red)and whitened time series (black). Bottom left: original noise (red) and whitened noise (black). Top right: power spectra of the original andwhitened time series. Bottom right: power spectra of original and whitened noise sequences.

13

0 50 100 150 200 250−4

−3

−2

−1

0

1

2

3

4

5

Sig

nal+

Noi

se

Time Series

100 101 10210−6

10−5

10−4

10−3

10−2

10−1

100Spectra

0 50 100 150 200 250Time (bins)

−4

−3

−2

−1

0

1

2

3

4

5

Noi

seon

ly

100 101 102

Frequency (bins)

10−5

10−4

10−3

10−2

10−1

100

Cholesky whitening: N =256 Sine+RN+WN Si = 2.0 S/Nr = 0.50 S/Nw = 1.00

Figure 5: Example of whitening using the Cholesky decomposition. The signal consists of a sine wave with period of 10.23 time bins withadditive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noise are given. Top left: original time series (red)and whitened time series (black). Bottom left: original noise (red) and whitened noise (black). Top right: power spectra of the original andwhitened time series. Bottom right: power spectra of original and whitened noise sequences.

14

0 50 100 150 200 250−50

−40

−30

−20

−10

0

10

20

30

40

Sig

nal+

Noi

se

Time Series

100 101 10210−5

10−4

10−3

10−2

10−1

100

101

102

103Spectra

0 50 100 150 200 250Time (bins)

−50

−40

−30

−20

−10

0

10

20

30

40

Noi

seon

ly

100 101 102

Frequency (bins)

10−5

10−4

10−3

10−2

10−1

100

101

102

103

Cholesky whitening: N =256 Sine+RN+WN Si = 3.0 S/Nr = 0.01 S/Nw = 1.00

Figure 6: Example of whitening using the Cholesky decomposition. The signal consists of a sine wave with period of 10.23 time bins withadditive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noise are given. Top left: original time series (red)and whitened time series (black). Bottom left: original noise (red) and whitened noise (black). Top right: power spectra of the original andwhitened time series. Bottom right: power spectra of original and whitened noise sequences.

15

0 50 100 150 200 250−150

−100

−50

0

50

100

150

Sig

nal+

Noi

se

Time Series

100 101 10210−5

10−4

10−3

10−2

10−1

100

101

102

103

104Spectra

0 50 100 150 200 250Time (bins)

−150

−100

−50

0

50

100

150

Noi

seon

ly

100 101 102

Frequency (bins)

10−5

10−4

10−3

10−2

10−1

100

101

102

103

104

Cholesky whitening: N =256 Sine+RN+WN Si = 5.0 S/Nr = 0.01 S/Nw = 1.00

Figure 7: Example of whitening using the Cholesky decomposition. The signal consists of a sine wave with period of 10.23 time bins withadditive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noise are given. Top left: original time series (red)and whitened time series (black). Bottom left: original noise (red) and whitened noise (black). Top right: power spectra of the original andwhitened time series. Bottom right: power spectra of original and whitened noise sequences.

16

0 50 100 150 200 250−15

−10

−5

0

5

10

15

Sig

nal+

Noi

se

Time Series

100 101 10210−6

10−5

10−4

10−3

10−2

10−1

100

101

102Spectra

0 50 100 150 200 250Time (bins)

−15

−10

−5

0

5

10

15

Noi

seon

ly

100 101 102

Frequency (bins)

10−6

10−5

10−4

10−3

10−2

10−1

100

101

102

Cholesky whitening: N =256 Sine+RN+WN Si = 5.0 S/Nr = 0.10 S/Nw = 1.00

Figure 8: Example of whitening using the Cholesky decomposition. The signal consists of a sine wave with period of 10.23 time bins withadditive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noise are given. Top left: original time series (red)and whitened time series (black). Bottom left: original noise (red) and whitened noise (black). Top right: power spectra of the original andwhitened time series. Bottom right: power spectra of original and whitened noise sequences.

17

0 50 100 150 200 250−6

−4

−2

0

2

4

6

Sig

nal+

Noi

se

Time Series

100 101 10210−6

10−5

10−4

10−3

10−2

10−1

100Spectra

0 50 100 150 200 250Time (bins)

−5

−4

−3

−2

−1

0

1

2

3

4

Noi

seon

ly

100 101 102

Frequency (bins)

10−6

10−5

10−4

10−3

10−2

10−1

Cholesky whitening: N =256 Sine+RN+WN Si = 5.0 S/Nr = 0.50 S/Nw = 1.00

Figure 9: Example of whitening using the Cholesky decomposition. The signal consists of a sine wave with period of 10.23 time bins withadditive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noise are given. Top left: original time series (red)and whitened time series (black). Bottom left: original noise (red) and whitened noise (black). Top right: power spectra of the original andwhitened time series. Bottom right: power spectra of original and whitened noise sequences.

18

Impulse Response and Spectrum of Whitening FilterWe can think of the Cholesky decomposition as a filter that suppresses low frequencies for thepurpose of estimating the parameters of a sinusoid. The filter response can be calculated from theimpulse response as follows:

Construct a data vector i corresponding to ij = 0 for all j except j = j0 where ij0 = 1.

Then the impulse response is h = L−1i. Then, expressed as a time function hj, j = 1, · · · , N , thefrequency-domain response is the squared magnitude of the DFT of hj:

Hk = |hk|2

19

Figure 10: Example of whitening using the Cholesky decomposition along with the impulse response and its spectrum. The signal consists ofa sine wave with period of 10.23 time bins with additive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noiseare given. Left figure: Top left: original time series (red) and whitened time series (black). Bottom left: original noise (red) and whitenednoise (black). Top right: power spectra of the original and whitened time series. Bottom right: power spectra of original and whitened noisesequences. Right figure: Top panel: input impulse (red) and impulse response of the Cholesky filter. Bottom Panel: Spectra of the impulseand impulse response, respectively. The filter shows the suppression of frequencies below about 25 bins; this frequency is signal-to-noise ratiodependent.

20