Non-sampling functional approximation of linear and non-linear Bayesian Update

Post on 11-Jan-2017

16 views 1 download

Transcript of Non-sampling functional approximation of linear and non-linear Bayesian Update

Non-sampling functional approximation of linear andnon-linear Bayesian Update

Alexander Litvinenko

H. G. Matthies

Institute of Scientific Computing, TU Braunschweig

Brunswick, Germany

wire@tu-bs.de

http://www.wire.tu-bs.de

2

Lorenz-63 Problem

Lorenz 1963 is a system of ODEs. Has chaotic solutions for certainparameter values and initial conditions. [LBU is done by O. Pajonk].

x = σ(y − x)y = x(ρ− z)− yz = xy − βz

Initial state q0(ω) = (x0(ω), y0(ω), z0(ω)) are uncertain.

Solving in t0, t1, ..., t10, Noisy Measur. → UPDATE, solving in t11, t12, ..., t20, NoisyMeasur. → UPDATE,...

IDEA of the Bayesian Update (BU):Take qf(ω) = q0(ω).Linear BU: qa = qf +K · (z − y)Non-Linear BU: qa = qf + H1 · (z − y) + (z − y)T · H2 · (z − y).

TU Braunschweig Institute of Scientific Computing

CCScien

tifi omputing

3

Setting for the identification process

General idea:We observe / measure a system, whose structure we know in principle.

The system behaviour depends on some quantities (parameters),which we do not know ⇒ uncertainty.

We model (uncertainty in) our knowledge in a Bayesian setting:as a probability distribution on the parameters.

We start with what we know a priori, then perform a measurement.This gives new information, to update our knowledge (identification).

Update in probabilistic setting works with conditional probabilities⇒ Bayes’s theorem.

Repeated measurements lead to better identification.

TU Braunschweig Institute of Scientific Computing

CCScien

tifi omputing

4

Mathematical setup

Consider

A(u; q) = f ⇒ u = S(f ; q),

where S is solution operator.

Operator depends on parameters q ∈ Q,hence state u ∈ U is also function of q:

Measurement operator Y with values in Y:

y = Y (q;u) = Y (q, S(f ; q)).

Examples of measurements:(ODE) u(t) = (x(t), y(t), z(t))T , y(t) = (x(t), y(t))T

(PDE) y(ω) =∫D0u(ω, x)dx, y(ω) =

∫D0| gradu(ω, x)|2dx, u in few points

TU Braunschweig Institute of Scientific Computing

CCScien

tifi omputing

5

Inverse problem

For given f , measurement y is just a function of q.This function is usually not invertible ⇒ ill-posed problem,

measurement y does not contain enough information.

In Bayesian framework state of knowledge modelled in a probabilistic way,parameters q are uncertain, and assumed as random.

Bayesian setting allows updating / sharpening of informationabout q when measurement is performed.

The problem of updating distribution—state of knowledge of qbecomes well-posed.

Can be applied successively, each new measurement y andforcing f —may also be uncertain—will provide new information.

TU Braunschweig Institute of Scientific Computing

CCScien

tifi omputing

6

Conditional probability and expectation

With state u ∈ U = U ⊗ S a RV, the quantity to be measured

y(ω) = Y (q(ω), u(ω))) ∈ Y := Y ⊗ S

is also uncertain, a random variable.A new measurement z is performed, composed from the

“true” value y ∈ Y and a random error ε: z(ω) = y + ε(ω) ∈ Y .

Classically, Bayes’s theorem gives conditional probability

P(Iq|Mz) =P(Mz|Iq)P(Mz)

P(Iq);

expectation with this posterior measure is conditional expectation. Kolmogorov startsfrom conditional expectation E (·|Mz),

from this conditional probability via P(Iq|Mz) = E(χIq|Mz

).

TU Braunschweig Institute of Scientific Computing

CCScien

tifi omputing

7

Conditional expectation

The conditional expectation is defined asorthogonal projection onto the closed subspace L2(Ω,P, σ(z)):

E(q|σ(z)) := PQ∞q = argminq∈L2(Ω,P,σ(z)) ‖q − q‖2L2

The subspace Q∞ := L2(Ω,P, σ(z)) represents the available information, estimateminimises Φ(·) := ‖q − (·)‖2 over Q∞.

The update, also called the assimilated value qa(ω) := PQ∞q = E(q|σ(z)), is aQ-valued RV

and represents new state of knowledge after the measurement.

TU Braunschweig Institute of Scientific Computing

CCScien

tifi omputing

8

Approximation

Minimisation equivalent to orthogonality: find φ ∈ L0(Y,Q)

∀v ∈ Q∞ : 〈〈DqaΦ(qa(φ)), v〉〉L2 = 〈〈q − qa, v〉〉L2 = 0,

⇔ DφΦ := DqaΦ Dφqa = 0 with qa(φ) := φ(z).

Approximation of Q∞: take Qn ⊂ Q∞

Qn := ϕ ∈ Q : ϕ = ψn Y, ψn a nth degree polynomial

i.e. ϕ = H0 + H1 Y + · · ·+ Hk Y ⊗k + · · ·+ Hn Y ⊗n,where Hk ∈ L k

s (Y,Q) is symmetric and k-linear.

With qa(φ) = qa(( H0 , . . . , Hk , . . . , Hn )) =

∑nk=0 Hk z⊗k = PQnq,

orthogonality implies

∀` = 0, . . . , n : D( H` )Φ(qa( H0 , . . . , Hk , . . . , Hn )) = 0

TU Braunschweig Institute of Scientific Computing

CCScien

tifi omputing

9

Approximation

Φ(qa) = ‖q − qa‖2 = ‖q − ( H0 + H1 Y + · · ·+ Hn Y ⊗n)‖2 =

〈q − ( H0 + H1 Y + · · ·+ Hn Y ⊗n), q − ( H0 + H1 Y + · · ·+ Hn Y ⊗n)〉.

D( H0 )Φ(qa(...)) : 〈q − ( H0 + H1 Y + · · ·+ Hn Y ⊗n), 1〉 = 0.

H0 + H1 〈Y 〉+ · · ·+ Hn 〈Y ⊗n〉 = 〈q〉.D( H1 )Φ(qa(...)) : 〈q − ( H0 + H1 Y + · · ·+ Hn Y ⊗n), Y 〉 = 0.

H0 〈Y 〉+ H1 〈Y, Y 〉+ · · · Hn 〈Y ⊗n+1〉 = 〈q, Y 〉.D( Hn )Φ(qa(...)) : 〈q − ( H0 + H1 Y + · · ·+ Hn Y ⊗n), Y ⊗n〉 = 0.

H0 〈Y ⊗n〉+ H1 〈Y ⊗n+1〉+ · · · Hn 〈Y ⊗2n〉 = 〈q, Y ⊗n〉.

TU Braunschweig Institute of Scientific Computing

CCScien

tifi omputing

10

Bayesian update in components

For finite dimensional spaces, or after discretisation,in components:

let q = (qm), Y = (Y ), and Hk = ( Hk m1...k

), then:

∀` = 0, . . . , n;

〈Y 1 · · ·Y `〉 ( H0 m) + · · ·+ 〈Y 1 · · ·Y `+k〉 ( Hk m`+1...`+k

)+

· · ·+ 〈Y 1 · · ·Y `+n〉 ( Hn m`+1...`+n

) = 〈qmY 1 · · ·Y `〉.

matrix does not depend on m—is identically block diagonal.

TU Braunschweig Institute of Scientific Computing

CCScien

tifi omputing

11

Ingredients of the quadratic Bayessian Update

H0 + H1 〈Y 〉+ H2 〈Y ⊗2〉 = 〈q〉,

H0 〈Y 〉 + H1 〈Y ⊗2〉+ H2 〈Y ⊗3〉 = 〈q ⊗ Y 〉

H0 〈Y ⊗2〉+ H1 〈Y ⊗3〉+ H2 〈Y ⊗4〉= 〈q ⊗ Y ⊗2〉

where Y = (x(ω), y(ω), z(ω))T expanded in PCE, i.e. Y ∈ R3×|J |,〈Y ⊗2〉 − 〈Y 〉2 is the covariance matrix,〈Y ⊗4〉 is determ. tensor of size 3× 3× 3× 3. 1 〈Y 〉 〈Y ⊗2〉

〈Y 〉 〈Y ⊗2〉 〈Y ⊗3〉〈Y ⊗2〉 〈Y ⊗3〉 〈Y ⊗4〉

H0

H1

H2

=

〈q〉〈q ⊗ Y 〉〈q ⊗ Y ⊗2〉

the linear system is symmetric positive definite.

TU Braunschweig Institute of Scientific Computing

CCScien

tifi omputing

12

Numerical solution of the system

Lorenz63: q = (x, y, z), Y = (x, y, z) + ε.

For x (for y, z the same) the system above is (1 + 4+ 9)× (1 + 4+ 9) = 13× 13 large.But there are linear dependences (e. g. H2 12 = H2 21)!If remove linear dependent rows and columns, obtain 10× 10 system.Solution is ( H0 i, H1 i

1, H1 i

2, H1 i

3, H2 i

11, ..., H2 i

33), i = 1, 2, 3.

If ε = 0, then q = Y and system can be solved analytically 1 〈q〉 〈q⊗2〉〈q〉 〈q⊗2〉 〈q⊗3〉〈q⊗2〉 〈q⊗3〉 〈q⊗4〉

H0

H1

H2

=

〈q〉〈q ⊗ q〉〈q ⊗ q⊗2〉

And H0 = 0, H1 = I, H2 = 0.

TU Braunschweig Institute of Scientific Computing

CCScien

tifi omputing

13

Polynomial Chaos Expansion

Let Y (x,θ), θ = (θ1, ..., θM , ...), is approximated:

Y (x,θ) =∑

β∈Jm,p

Hβ(θ)Yβ(x),

Yβ(x) =1

β!

∫Θ

Hβ(θ)Y (x,θ)P(dθ).

In Lorenz63 model matrix of PCE coefficients is Y ∈ R3×|Jm,p|, |Jm,p| = (m+p)!m!p! .

E.g. multiindex for Y ⊗ Y will be Jm,2p and for Y ⊗4 will be Jm,4p.

If (Y − Z) is needed, for uncorrelated Y (Jm1,p) and Z (Jm2,p), then must buildJm1+m2,p.

TU Braunschweig Institute of Scientific Computing

CCScien

tifi omputing

14

MAIN RESULT 1: Special cases

For n = 0 (constant functions) ⇒ qa = H0 = 〈q〉 (= E (q)).

For n = 1 the approximation is with affine functions

H0 + H1 〈z〉 =〈q〉

H0 〈z〉+ H1 〈z ⊗ z〉=〈q ⊗ z〉

=⇒ (remember that [covqz] = 〈q ⊗ z〉 − 〈q〉 ⊗ 〈z〉 )

H0 = 〈q〉 − H1 〈z〉

H1 (〈z ⊗ z〉 − 〈z〉 ⊗ 〈z〉) =〈q ⊗ z〉 − 〈q〉 ⊗ 〈z〉

⇒ H1 = [covqz][covzz]−1 (Kalman’s solution),

H0 = 〈q〉 − [covqz][covzz]−1〈z〉,

and finally

qa = H0 + H1 z = 〈q〉+ [covqz][covzz]−1(z − 〈z〉).

TU Braunschweig Institute of Scientific Computing

CCScien

tifi omputing

15

Measurement error

Let ε ∈ N(0, σ2I) be the Gaussian noise.Since RVs Y and ε are uncorrelated, obtain

〈Y + ε〉 = 〈Y 〉+ 〈ε〉 = 〈Y 〉

〈(Y + ε)⊗2〉 = 〈Y ⊗2〉+ 2〈Y 〉〈ε〉+ 〈ε⊗2〉 = 〈Y ⊗2〉+ σ2I

〈(Y + ε)⊗3〉 = 〈Y ⊗3〉+ 3〈Y 〉〈ε⊗2〉

〈(Y + ε)⊗4〉 = 〈Y ⊗4〉+ 6〈Y ⊗2〉〈ε⊗2〉+ 〈ε⊗4〉

(since 〈ε⊗(2k+1)〉 = 0).

One should be accurate in computing the RHS 〈q ⊗ Y 〉, 〈q ⊗ Y ⊗2〉, ... since RVs q andY belong to different overlapping spaces.

TU Braunschweig Institute of Scientific Computing

CCScien

tifi omputing

16

MAIN RESULT 2: Final Step of Non-linear BU

After H0 , H1 , H2 are computed, evaluate

qa = H0 + H1 (z − Y ) + H2 (z − Y )⊗2,

where z = z(ω) = y(ω) + ε(ω), Y are the noisy measurements.

TU Braunschweig Institute of Scientific Computing

CCScien

tifi omputing

17

Numerics

Abbildung 1: One update in the linearized Lorenz63.

TU Braunschweig Institute of Scientific Computing

CCScien

tifi omputing

18

Abbildung 2: Linear (left) and non-linear (right) Bayesian updates in

Lorenz63.

TU Braunschweig Institute of Scientific Computing

CCScien

tifi omputing

19

Conclusion about NLBU

• + Is functional, sampling free approximation.

• + Give a hope to be used for solving inverse problems.

• - Stochastic dimension grows up very fast.

• - Is very “expensive” in computing and requires efficient tensor arithmetics.

TU Braunschweig Institute of Scientific Computing

CCScien

tifi omputing

20

Literature

1. O. Pajonk, B. V. Rosic, A. Litvinenko, and H. G. Matthies, A Deterministic Filterfor Non-Gaussian Bayesian Estimation, Physica D: Nonlinear Phenomena, Vol.241(7), pp. 775-788, 2012.

2. B. V. Rosic, A. Litvinenko, O. Pajonk and H. G. Matthies, Sampling Free LinearBayesian Update of Polynomial Chaos Representations, J. of Comput. Physics,Vol. 231(17), 2012 , pp 5761-5787

TU Braunschweig Institute of Scientific Computing

CCScien

tifi omputing