EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special...
-
date post
22-Dec-2015 -
Category
Documents
-
view
254 -
download
1
Transcript of EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special...
![Page 1: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/1.jpg)
EM algorithm
LING 572
Fei Xia
03/02/06
![Page 2: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/2.jpg)
Outline
• The EM algorithm
• EM for PM models
• Three special cases– Inside-outside algorithm– Forward-backward algorithm– IBM models for MT
![Page 3: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/3.jpg)
The EM algorithm
![Page 4: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/4.jpg)
Basic setting in EM
• X is a set of data points: observed data• Θ is a parameter vector.• EM is a method to find θML where
• Calculating P(X | θ) directly is hard.• Calculating P(X,Y|θ) is much simpler,
where Y is “hidden” data (or “missing” data).
)|(logmaxarg
)(maxarg
XP
LML
![Page 5: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/5.jpg)
The basic EM strategy
• Z = (X, Y)– Z: complete data (“augmented data”)– X: observed data (“incomplete” data)– Y: hidden data (“missing” data)
• Given a fixed x, there could be many possible y’s.– Ex: given a sentence x, there could be many state
sequences in an HMM that generates x.
![Page 6: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/6.jpg)
Examples of EM
HMM PCFG MT Coin toss
X
(observed)
sentences sentences Parallel data Head-tail sequences
Y (hidden) State sequences
Parse trees Word alignment
Coin id sequences
θ aij
bijk
P(ABC) t(f|e)
d(aj|j, l, m), …
p1, p2, λ
Algorithm Forward-backward
Inside-outside
IBM Models N/A
![Page 7: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/7.jpg)
The log-likelihood function
• L is a function of θ, while holding X constant: )|()()|( XPLXL
)|,(log
)|(log
)|(log
)|(log)(log)(
1
1
1
yxP
xP
xP
XPLl
iy
n
i
i
n
i
n
ii
![Page 8: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/8.jpg)
The iterative approach for MLE
)|,(logmaxarg
)(maxarg
)(maxarg
1
yxp
l
L
n
i yi
ML
,....,...,, 10 tIn many cases, we cannot find the solution directly.
An alternative is to find a sequence:
....)(...)()( 10 tlll s.t.
![Page 9: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/9.jpg)
])|,(
)|,([log
])|,(
)|,([log
)|,(
)|,(),|(log
)|,(
)|,(
)|',(
)|,(log
)|,(
)|,(
)|',(
)|,(log
)|',(
)|,(log
)|,(
)|,(
log
)|,(log)|,(log
)|(log)|(log)()(
1),|(
1),|(
1
'1
'1
'1
1
11
ti
in
ixyP
ti
in
ixyP
ti
itn
i yi
ti
i
yt
yi
ti
n
i
ti
ti
yt
yi
in
i
yt
yi
in
i
t
yi
yin
i
t
yi
n
iyi
n
i
tt
yxP
yxPE
yxP
yxPE
yxP
yxPxyP
yxP
yxP
yxP
yxP
yxP
yxP
yxP
yxP
yxP
yxP
yxP
yxP
yxPyxP
XPXPll
ti
ti
Jensen’s inequality
![Page 10: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/10.jpg)
Jensen’s inequality
])([()](([, xgEfxgfEthenconvexisfif
)])([log()]([log( xpExpE
])([()](([, xgEfxgfEthenconcaveisfif
log is a concave function
![Page 11: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/11.jpg)
Maximizing the lower bound
)]|,([logmaxarg
)|,(log),|(maxarg
)|,(
)|,(log),|(maxarg
])|,(
)|,([logmaxarg
1),|(
1
1
1),|(
)1(
yxPE
yxPxyP
yxP
yxPxyP
yxp
yxpE
i
n
ixyP
it
i
n
i y
ti
iti
n
i y
ti
in
ixyP
t
ti
ti
The Q function
![Page 12: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/12.jpg)
The Q-function
• Define the Q-function (a function of θ):
– Y is a random vector.– X=(x1, x2, …, xn) is a constant (vector).– Θt is the current parameter estimate and is a constant (vector).– Θ is the normal variable (vector) that we wish to adjust.
• The Q-function is the expected value of the complete data log-likelihood P(X,Y|θ) with respect to Y given X and θt.
)|,(log),|(
)]|,([log)|,(log),|(
)]|,([log],|)|,([log),(
1
1),|(
),|(
yxPxyP
yxPEYXPXYP
YXPEXYXPEQ
it
n
i yi
n
iixyP
Y
t
XYP
tt
ti
t
![Page 13: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/13.jpg)
The inner loop of the EM algorithm
• E-step: calculate
• M-step: find
),(maxarg)1( tt Q
)|,(log),|(),(1
yxPxyPQ it
n
i yi
t
![Page 14: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/14.jpg)
L(θ) is non-decreasing at each iteration
• The EM algorithm will produce a sequence
• It can be proved that
,....,...,, 10 t
....)(...)()( 10 tlll
![Page 15: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/15.jpg)
The inner loop of the Generalized EM algorithm (GEM)
• E-step: calculate
• M-step: find
),(maxarg)1( tt Q
)|,(log),|(),(1
yxPxyPQ it
n
i yi
t
),(),( 1 tttt QQ
![Page 16: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/16.jpg)
Recap of the EM algorithm
![Page 17: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/17.jpg)
Idea #1: find θ that maximizes the likelihood of training data
)|(logmaxarg
)(maxarg
XP
LML
![Page 18: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/18.jpg)
Idea #2: find the θt sequence
No analytical solution iterative approach, find
s.t.
,....,...,, 10 t
....)(...)()( 10 tlll
![Page 19: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/19.jpg)
Idea #3: find θt+1 that maximizes a tight lower bound of )()( tll
a tight lower bound
])|,(
)|,([log)()(
1),|( t
i
in
ixyP
t
yxP
yxPEll t
i
![Page 20: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/20.jpg)
Idea #4: find θt+1 that maximizes the Q function
)]|,([logmaxarg
])|,(
)|,([logmaxarg
1),|(
1),|(
)1(
yxPE
yxp
yxpE
i
n
ixyP
ti
in
ixyP
t
ti
ti
Lower bound of )()( tll
The Q function
![Page 21: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/21.jpg)
The EM algorithm
• Start with initial estimate, θ0
• Repeat until convergence– E-step: calculate
– M-step: find
),(maxarg)1( tt Q
)|,(log),|(),(1
yxPxyPQ it
n
i yi
t
![Page 22: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/22.jpg)
Important classes of EM problem
• Products of multinomial (PM) models
• Exponential families
• Gaussian mixture
• …
![Page 23: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/23.jpg)
The EM algorithm for PM models
![Page 24: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/24.jpg)
PM models
mi
yxici
i
yxici
i
yxici pppyxp ),,(),,(),,(
1
...)|,(
1 ji
ip
Where is a partition of all the parameters, and for any j
),...,( 1 m
![Page 25: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/25.jpg)
HMM is a PM
kji
j
kw
ik
jsis
ji
wss
yxsscj
w
i
yxsscji
ssp
ssp
yxp
,,
),,(
),,(
)(
)(
)|,(
,
1
1
kijk
jij
b
a
![Page 26: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/26.jpg)
PCFG
• PCFG: each sample point (x,y):– x is a sentence– y is a possible parse tree for that sentence.
)|()|,(1
ii
n
ii AAPyxP
)|(
)|(
)|(
)|,(
VPsleepsVPP
NPJimNPP
SVPNPSP
yxP
![Page 27: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/27.jpg)
PCFG is a PM
,
),,()(
)|,(
A
yxAcAp
yxp
A
Ap 1)(
![Page 28: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/28.jpg)
Q-function for PM
)log),,(),|((...)log),,(),|((
)log),|((...)log),|((
))(log(),|(
)|,(log),|(
),(
11
),,(
1
),,(
1
),,(
1
1
1
1
jij
tn
i yiji
j
tn
i yi
yxjC
jj
tn
i yi
yxjC
jj
tn
i yi
yxjC
k jj
tn
i yi
it
n
i yi
t
pyxjCxyPpyxjCxyP
pxyPpxyP
pxyP
yxPxyP
Q
k
i
k
i
k
),(1tQ
![Page 29: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/29.jpg)
Maximizing the Q function
jj
it
n
i yi
t pyxjCxyPQ log),,(),|(),(11
1
11
j
jp
Maximize
Subject to the constraint
11
)log),,(),|((),(ˆ1
1j
jjj
it
n
i yi
t ppyxjCxyPQ
Use Lagrange multipliers
![Page 30: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/30.jpg)
Optimal solution
11
)log),,(),|((),(ˆ1
1j
jjj
it
n
i yi
t ppyxjCxyPQ
0)/),,(),|((),(ˆ
1
1
ji
tn
i yi
j
t
pyxjCxypp
Q
),,(),|(1
yxjCxyp
pi
tn
i yi
j
Normalization factor
Expected count
![Page 31: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/31.jpg)
PM Models
r is rth parameter in the model. Each parameter is the member of some multinomial distribution.
Count(x,y, r) is the number of times that is seen in the expression for P(x, y | θ)
r
![Page 32: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/32.jpg)
The EM algorithm for PM Models
• Calculate expected counts
• Update parameters
![Page 33: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/33.jpg)
PCFG example
• Calculate expected counts
• Update parameters
![Page 34: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/34.jpg)
The EM algorithm for PM models
// for each iteration
// for each training example xi
// for each possible y
// for each parameter
// for each parameter
![Page 35: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/35.jpg)
Inside-outside algorithm
![Page 36: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/36.jpg)
Inner loop of the Inside-outside algorithm
Given an input sequence and1. Calculate inside probability:
• Base case• Recursive case:
2. Calculate outside probability:• Base case:
• Recursive case:
)(),( kj
j wNPkk
),1(),()(),(,
1
qddpNNNPqp srsr
sr
q
pd
jj
otherwise
jifmj 0
11),1(
)1,()(),(
),1()(),(),(
,
1
1
, 1
peNNNPqe
eqNNNPepqp
gjgf
gf
p
ef
ggjf
gf
m
qefj
![Page 37: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/37.jpg)
Inside-outside algorithm (cont)
)(
),1(),()(),(
)|,(
1
1
1 1
1
m
q
pdsr
srjj
m
p
m
q
msrjj
wP
qddpNNNPqp
wNNNNP
)(
),(),(),()|,(
1
11
m
m
h
khjj
mjkj
wP
wwhhhhwusedisNwNP
3. Collect the counts
4. Normalize and update the parameters
km
jkjm
jkj
kj
k
kjkj
r sm
srjjm
srjj
r s
srj
srjsrj
wusedisNwNP
wusedisNwNP
wNCnt
wNCntwNP
wNNNNP
wNNNNP
NNNCnt
NNNCntNNNP
)|,(
)|,(
)(
)()(
)|,(
)|,(
)(
)()(
1
1
1
1
![Page 38: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/38.jpg)
Expected counts for PCFG rules
),|,(
),|,(
),,(*),|(
),,(*),|()(
1
11 1
11
msrjj
msrjj
pq
m
p
m
q
srj
Trmm
srj
Y
srj
wNNNNP
wNNNNP
NNNwTrcountwTrP
NNNYXcountXYPNNNcount
This is the formula if we have only one sentence.Add an outside sum if X contains multiple sentences.
![Page 39: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/39.jpg)
Expected counts (cont)
),|,(
),,(*),|(
),,(*),|()(
11
11
mjkj
m
h
Tr
kjmm
kj
Y
kj
wusedisNwNP
wNTrwcountwTrP
wNYXcountXYPwNcount
![Page 40: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/40.jpg)
Relation to EM
• PCFG is a PM Model
• Inside-outside algorithm is a special case of the EM algorithm for PM Models.
• X (observed data): each data point is a sentence w1m.
• Y (hidden data): parse tree Tr.
• Θ (parameters):
)(
)(
kj
srj
wNP
NNNP
![Page 41: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/41.jpg)
Forward-backward algorithm
![Page 42: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/42.jpg)
The inner loop for forward-backward algorithm
Given an input sequence and1. Calculate forward probability:
• Base case• Recursive case:
2. Calculate backward probability:• Base case:• Recursive case:
3. Calculate expected counts:
4. Update the parameters:
),,,,( BAKS
ii )1(
tijoiji
ij batt )()1(
1)1( Ti
tijoijj
ji batt )1()(
N
mmm
jijoijiij
tt
tbatt t
1
)()(
)1()()(
T
tij
N
j
T
tij
ij
t
ta
11
1
)(
)(
T
tij
T
tijkt
ijk
t
twob
1
1
)(
)(),(
![Page 43: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/43.jpg)
Expected counts
)(
),|,(
),,(*),|(
),,(*),|()(
1
111
1111
1
t
OjXiXP
ssXOcountOXP
ssYXcountXYPsscount
T
tij
Tt
T
tt
jX
iTTTT
Yjiji
T
![Page 44: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/44.jpg)
Expected counts (cont)
),()(
),(*),|,(
),,(*),|(
),,(*),|()(
1
111
1111
1
kk
T
tij
kkTt
T
tt
jX
w
iTTTT
j
w
iY
j
w
i
wOt
wOOjXiXP
ssXOcountOXP
ssYXcountXYPsscount
T
k
kk
![Page 45: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/45.jpg)
Relation to EM
• HMM is a PM Model
• Forward-backward algorithm is a special case of the EM algorithm for PM Models.
• X (observed data): each data point is an O1T.
• Y (hidden data): state sequence X1T.
• Θ (parameters): aij, bijk, πi.
![Page 46: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/46.jpg)
IBM models for MT
![Page 47: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/47.jpg)
Expected counts for (f, e) pairs
• Let Ct(f, e) be the fractional count of (f, e) pair in the training data.
)),(),(*),|((),(||
1,ja
F
jj
FE a
eeffFEaPefCt
Alignment prob Actual count of times e and f are linked in (E,F) by alignment a
FVx
exCt
efCteft
),(
),()|(
![Page 48: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/48.jpg)
Relation to EM
• IBM models are PM Models.
• The EM algorithm used in IBM models is a special case of the EM algorithm for PM Models.
• X (observed data): each data point is a sentence pair (F, E).
• Y (hidden data): word alignment a.• Θ (parameters): t(f|e), d(i | j, m, n), etc..
![Page 49: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/49.jpg)
Summary
• The EM algorithm– An iterative approach – L(θ) is non-decreasing at each iteration– Optimal solution in M-step exists for many classes of
problems.
• The EM algorithm for PM models– Simpler formulae– Three special cases
• Inside-outside algorithm• Forward-backward algorithm• IBM Models for MT
![Page 50: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/50.jpg)
Relations among the algorithms
The generalized EM
The EM algorithm
PM Gaussian MixInside-OutsideForward-backwardIBM models
![Page 51: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/51.jpg)
Strengths of EM
• Numerical stability: in every iteration of the EM algorithm, it increases the likelihood of the observed data.
• The EM handles parameter constraints gracefully.
![Page 52: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/52.jpg)
Problems with EM
• Convergence can be very slow on some problems and is intimately related to the amount of missing information.
• It guarantees to improve the probability of the training corpus, which is different from reducing the errors directly.
• It cannot guarantee to reach global maxima (it could get struck at the local maxima, saddle points, etc)
The initial estimate is important.
![Page 53: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/53.jpg)
Additional slides
![Page 54: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/54.jpg)
Lower bound lemma
)()ˆ( 0xgxg
If
Then
)()()ˆ()ˆ( 00 xgxfxfxg Proof :
![Page 55: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.](https://reader033.fdocument.pub/reader033/viewer/2022061511/56649d7f5503460f94a62ac2/html5/thumbnails/55.jpg)
L(θ) is non-decreasing
])|,(
)|,([log)()(
1),|( t
i
in
ixyP
t
yxP
yxPEll t
i
)()()( tllg
])|,(
)|,([log)(
1),|( t
i
in
ixyP yxP
yxPEf t
i
)(maxarg
0)()()()(1
f
fgandfgt
tt
0)()( 1 tt gg
)()( 1 tt ll
Let
We have
(By lower bound lemma)