11.5 MAP Estimator - Binghamton

11.5 MAP EstimatorRecall that the “hit-or-miss” cost function gave the MAP estimator… it maximizes the a posteriori PDF

Q: Given that the MMSE estimator is “the most natural” one…why would we consider the MAP estimator?

A: If x and θ are not jointly Gaussian, the form for MMSE estimate requires integration to find the conditional mean.

MAP avoids this Computational Problem!Note: MAP doesn’t require this integration

Trade “natural criterion” vs. “computational ease”

What else do you gain? More flexibility to choose the prior PDF

Notation and Form for MAP

)|(maxargˆ xθθθ

pMAP =

MAPθNotation: maximizes the posterior PDF

“arg max” extracts the value of θ that causes the maximum

Equivalent Form (via Bayes’ Rule): )]()|([maxargˆ θθθθ

ppMAP x=

Proof: Use )()()|()|(

pppp θθθ =

)]()|([maxarg)(

)()|(maxargˆ θθθθθθθ

ppMAP x

Does not depend on θ

Vector MAP < Not as straight-forward as vector extension for MMSE >

The obvious extension leads to problems:

iθChoose to minimize }}ˆ({)ˆ( iii CE θθθ −=R

Exp. over p(x,θi)

⇒ )|(maxargˆ xii pi

θθθ

= 1-D marginal conditioned on x

Need to integrate to get it!!

Problem: The whole point of MAP was to avoid doing the integration needed in MMSE!!!

Is there a way around this?Can we find an Integration-Free Vector MAP?

pddpp θθθ 21 )|θ()|( ∫∫= xx

Circular Hit-or-Miss Cost Function Not in Book

First look at the p-dimensional cost function for this “troubling”version of a vector map:

It consists of p individual applications of 1-D “Hit-or-Miss”

δ-δ-δ

=square innot ),(,1

square in ),(,0),(

εεεεC

The corners of the square “let too much in” ⇒ use a circle!

This actually seems more natural than the “square” cost function!!!

MAP Estimate using Circular Hit-or-Miss Back to Book

So… what vector Bayesian estimator comes from using this circular hit-or-miss cost function?

Can show that it is the following “Vector MAP”

)|(maxargˆ xθθθ

pMAP = Does Not Require Integration!!!

That is… find the maximum of the joint conditional PDF

in all θi conditioned on x

How Do These Vector MAP Versions CompareIn general: They are NOT the Same!!

Example: p = 2p(θ1, θ2 | x)

1 2 3 4 5

The vector MAP using Circular Hit-or-Miss is: [ ]T5.05.2ˆ =θ

To find the vector MAP using the element-wise maximization:

p(θ1|x)

1 2 3 4 5

p(θ2|x)

2/3[ ]T5.15.2ˆ =θ

“Bayesian MLE”Recall… As we keep getting good data, p(θ|x) becomes more concentrated as a function of θ. But… since:

)]()|([maxarg)|(maxargˆ θθxxθθθθ

pppMAP ==

… p(x|θ) should also become more concentrated as a function of θ.

p(x|θ)p(θ)

• Note that the prior PDF is nearly constant where p(x|θ) is non-zero

• This becomes truer as N →∞, and p(x|θ) gets more concentrated

)|(maxarg)]()|([maxarg θxθθxθθ

ppp ≈

MAP “Bayesian MLE”

Uses conditional PDF rather than the parameterized PDF

11.6 Performance CharacterizationThe performance of Bayesian estimators is characterized by looking at the estimation error: θθε ˆ−=

Random (due to a priori PDF)

Random (due to x)

Performance characterized by error’s PDF p(ε)We’ll focus on Mean and Variance

If ε is Gaussian then these tell the whole storyThis will be the case for the Bayesian Linear Model (see Thm. 10.3)

We’ll also concentrate on the MMSE Estimator

Performance of Scalar MMSE Estimator

θθθ

xThe estimator is:

Function of x

So the estimation error is: ),(}|{ θθθε xx fE =−=Function of two RV’s

General Result for a function of two RVs: Z = f (X, Y)

dydxyxpyxfZE XY ),(),(}{ ∫∫=

{ } dydxyxpZEyxfZEZEZ XY ),(}){),((}){(}var{ 22 ∫∫ −=−=

Evaluated as seen below

So… applying the mean result gives:

{ }{ } 00

}]|{[][

θθε

See Chart on “Decomposing Joint

Expectations” in “Notes on 2 RVs”

Pass Eθ |x through the terms

Two Notations for the same thing

}|{)|(}|{

)|(}|{}]|{[

on dependnot does

θθθθ

∫0}{ =εE

i.e., the Mean of the Estimation Error (over data

& parm) is Zero!!!!

And… applying the variance result gives:

{ } { } { }

),()ˆ(

)ˆ(}){(}var{

θθθθ

θθεεεε

−==−=

∫∫ xxx

Use E{ε} = 0

So… the MMSE estimation error has:

• mean = 0

• var = Bmse

So… when we minimize Bmsewe are minimizing the variance

of the estimate

If ε is Gaussian then ( ))ˆ(,0~ θε BmseN

Ex. 11.6: DC Level in WGN w/ Gaussian PriorWe saw that

22 /1/1)ˆ(

ANABmse

σσ +=

σσσ

constantconstant

So… A is Gaussian

+ 22 /1/1,0~

σσε

Note: As N gets large this PDF collapses around 0.

This estimate is “consistent in the Bayesian sense”

Bayesian Consistency: For large N

(regardless of the realization of A!)

AA ≈ˆ

this is Gaussian because it is a linear combo of the jointly

Gaussian data samples

If X is Gaussian then Y = aX + b

is also Gaussian

Performance of Vector MMSE Estimatorθθε ˆ−=Vector estimation error: The mean result is obvious.

Must extend the variance result:

θθx,ε MεεCε ˆ}{}cov{ ∆=== TE

Some New Notation…“Bayesian Mean Square

Error Matrix”

Look some more at this:

}{ ]][[

}|{}|{

θx,θ

xθθxθθ

xθθxθθM

−−=

0ε =}{E }{ |ˆ xθxθε MC CE==

General Vector Results:

See Chart on “Decomposing

Joint Expectations”

= Cθ|x In general this is a function of x

θM ˆThe Diagonal Elements of are Bmse’s of the Estimates

),(}]|{[

),(}]|{[}{

dθdθpE

∫ ∫

∫ ∫ ∫

θxθxxεε

To see this:

Why do we call the error covariance the “Bayesian MSE Matrix”?

Integrate over all the other

parameters…“marginalizing”

the PDF

Perf. of MMSE Est. for Jointly Gaussian CaseLet the data vector x and the parameter vector θ be jointly Gaussian.

0ε =}{ENothing new to say about the mean result:

Now… look at the Error Covariance (i.e., Bayesian MSq Matrix):

}{ |ˆ xθxθε MC CE==Recall General Result:

Thm 10.2 says that for Jointly Gaussian Vectors we get that…Cθ|x does NOT depend on x

xθxθxθε MC ||ˆ }{ CCE ===

xθxθxθ

xθθε

−−=

Thm 10.2 also gives the form as:

Perf. of MMSE Est. for Bayesian Linear ModelwHθx += ~N(µθ,Cθ)

~N(0,Cw)Recall the model:

0ε =}{ENothing new to say about the mean result:

Now… for the error covariance… this is nothing more than a special case of the jointly Gaussian case we just saw:

xθxθxθ

xθθε

−−=

== CResults for Jointly Gaussian Case

THCC θθx =

wθx CHHCC += T

Evaluations for Bayesian Linear

( )( ) 111

−−−

+−===

HCCHHCHCCMC

θwθθθxθθε

Alternate Form … see (10.33)

Summary of MMSE Est. Error Results1. For all cases: Est. Error is zero mean 0ε =}{E

2. Error Covariance for three “Nested” Cases:

}{ |ˆ xθxθε MC CE==

[ ]iiiBmse θM ˆ)( =θ

General Case:

Jointly Gaussian: xθxθxθxθθε CCCCMC 1|ˆ

−−=== C

Bayesian Linear:Jointly Gaussian

& Linear Observation

( )( ) 111

−−−

+−===

HCCHHCHCCMC

θwθθθxθθε

Main Bayesian Approaches

MAP“Hit-or-Miss” Cost Function

MMSE“Squared” Cost Function

(In General: Nonlinear Estimate)

{ }{ }xθxθ CMxθθ

|ˆ :Cov. Err.

ˆ :EstimateE

Jointly Gaussian x and θ(Yields Linear Estimate)

{ } { }( )xθxxθxθθθ

xxCCθθ1

:Cov. Err.

ˆ :Estimate−

−+= EE

Bayesian Linear Model(Yields Linear Estimate)

{ } ( ) ( )

( ) θθθθθ

θθθ

HCCHHCHCCM

HµxCHHCHCθθ1

:Cov. Err.

ˆ :Estimate−

−++=

)|(maxargˆ :Estimate xθθθ

Hard to Implement…numerical integration

Easy to Implement…Performance Analysis

is Challenging

Easier to Implement…Determining Cθx can be

hard to find

“Easy” to Implement…Only need accurate model: Cθ, Cw, H

11.7 Example: Bayesian DeconvolutionThis example shows the power of Bayesian approaches over classical methods in signal estimation problems (i.e. estimating the signal rather than some parameters)

h(t) Σs(t)

Model as a zero-mean

WSS Gaussian

Process w/ known

ACF Rs(τ)

Assumed Known

Gaussian Bandlimited White Noise w/ Known Variance

Measured Data = Samples of x(t)

Goal: Observe x(t) & Estimate s(t)Note: At Output…

s(t) is Smeared & Noisy

So… model as D-T System

Sampled-Data Formulation

−−−

− ]1[

][]1[]1[

00]0[]1[

000]0[

nNhNhNh

Measured Data Vector x

Known Observation Matrix H

Signal Vector sto Estimate

AWGN: wCw = σ2I

We have modeled s(t) as zero-mean WSS process with known ACF…So… s[n] is a D-T WSS process with known ACF Rs[m]…So… vector s has a known covariance matrix (Toeplitz & Symmetric) given by:

]0[]1[]2[]1[

]2[]0[]1[]2[

]1[]0[]1[

]1[]2[]1[]0[

Model for Prior PDF is then s ~ N(0,Cs)

s and w are independent

MMSE Solution for DeconvolutionWe have the case of the Bayesian Linear Model… so:

( ) xIHHCHCs ss12ˆ −

+= σTT

Note that this is a linear estimateThis matrix is called “The Weiner Filter”

( ) 121ˆ /

−− +== σHHCMC ssεT

The performance of the filter is characterized by:

Sub-Example: No Inverse Filtering, Noise OnlyDirect observation of s with H = I… x = s + w

Σs(t)

x(t) Goal: Observe x(t) & “De-Noise” s(t)Note: At Output… s(t) with Noise

( ) xICCs ss12ˆ −

+= σ ( ) 121ˆ /

−− +== σICMC ssε

Note: Dimensionality Problem… # of “parms” = # of observationsClassical Methods Fail… xs =ˆ Bayesian methods can solve it!!

For insight… consider “single sample” case:

)(]0[]0[1

]0[]0[

]0[]0[s 22 SNRRxxR

0]0[ˆ]0[]0[ˆ ≈≈ sSNRLow

xsSNRHigh

Data Driven Prior PDF Driven

Sub-Sub-Example: Specific Signal ModelDirect observation of s with H = I… x = s + w

But here… the signal follows a specific random signal model

][]1[][ 1 nunsans +−−= u[n] is White Gaussian “Driving Process”

This is a 1st-order “auto-regressive” model: AR(1)Such a random signal has an ACF & PSD of

1][ ku

kR −

σ−+

See Figures 11.9 & 11.10 in the

Textbook

11.5 MAP Estimator - Binghamton

Documents

Transcript of 11.5 MAP Estimator - Binghamton

PERBANDINGAN ESTIMATOR OPTIMAL MENGGUNAKAN METODE …lib.unnes.ac.id/32186/1/4111413003.pdf · i PERBANDINGAN ESTIMATOR OPTIMAL MENGGUNAKAN METODE LEAST TRIMMED SQUARE (LTS) DAN SCALE

Development of a Femininity Estimator for Voice Therapy of ...

Ratio Estimator

THE BINGHAMTON COMMUNITY ORCHESTRAbinghamtoncommunityorchestra.org/data/uploads/11-18-1984-program.… · A THE BINGHAMTON COMMUNITY ORCHESTRA Sunday, November 18, 1984 3:00 CYNTHIA

11.5 Aula Transcrita

Maximum Likelihood Estimator II - Embedded and Signal ... · 10 Maximum Likelihood Estimator II Asst. Prof. Dr. Peerapol Yuvapoositanon, PhD, DIC Department of Electronic Engineering

Download (11.5 MB)

RATIO ESTIMATOR, DIFFERENCE ESTIMATOR, & … · digunakan untuk memperkecil nilai varians sampling sehingga akan meningkatkan presisi pendugaan parameter ... dan 95% Confidence Interval

Xilinx Power Estimator · Xilinx Power Estimator ユーザー ガイド 5 UG440 (v2016.3) 2016 年 10 月 5 日 japan.xilinx.com 第1 章 概要 はじめに Xilinx® Power Estimator

80623277 cost-estimator

ESTIMATOR KERNEL EPANECHNIKOV DAN KERNEL …etheses.uin-malang.ac.id/2895/1/11610008.pdf · estimator kernel epanechnikov dan kernel triangle pada data rata-rata bulanan bilangan

2013 Binghamton University Student Handbook

A simple modi cation of the Hill estimator with ...

RANCANG BANGUN ESTIMATOR WAKTU UNTUK MENGHAFAL …eprints.walisongo.ac.id/10633/1/1403056074_SKRIPSI.pdf · 2019. 11. 7. · vi ABSTRAK Judul : Rancang Bangun Estimator Rentang Waktu

Simon.quintero.11.5 jt.docx

Темпо 28.4-11.5

Bajer 11.5

Topological Analysis (2) - Binghamton University

H3: Linear models, Best Linear Unbiased Estimator

ESTIMASI REGRESI NONPARAMETRIK PENALIZED SPLINE …4.1.3 Penerapan Program R 3.3.2 Pada Data Kurs Rupiah Terhadap ... estimator kernel, deret orthogonal, estimator spline, estimator

Xilinx Power Estimator · Xilinx Power Estimator ユーザーガイド 5 UG440 (v2016.3) 2016 年 10 月 5 日 japan.xilinx.com 第1 章概要はじめに Xilinx® Power Estimator