Download - Topic model an introduction

Transcript

Topic Model๏ผˆโ‰ˆ

๐Ÿ

๐ŸText Mining๏ผ‰

Yueshen [email protected]

Middleware, CCNT, ZJU

Middleware, CCNT, ZJU6/11/2014

Text Mining&NLP&ML

1, Yueshen Xu

Outline

Basic Concepts

Application and Background

Famous Researchers

Language Model

Vector Space Model (VSM)

Term Frequency-Inverse Document Frequency (TF-IDF)

Latent Semantic Indexing (LSA)

Probabilistic Latent Semantic Indexing (pLSA)

Expectation-Maximization Algorithm (EM) & Maximum-

Likelihood Estimation (MLE)

6/11/2014 2 Middleware, CCNT, ZJU, Yueshen Xu

Outline

Latent Dirichlet Allocation (LDA)

Conjugate Prior

Possion Distribution

Variational Distribution and Variational Inference (VD

&VI)

Markov Chain Monte Carlo (MCMC)

Metropolis-Hastings Sampling (MH)

Gibbs Sampling and GS for LDA

Bayesian Theory v.s. Probability Theory

6/11/2014 3 Middleware, CCNT, ZJU, Yueshen Xu

Concepts

Latent Semantic Analysis

Topic Model

Text Mining

Natural Language Processing

Computational Linguistics

Information Retrieval

Dimension Reduction

Expectation-Maximization(EM)

6/11/2014 Middleware, CCNT, ZJU

Information Retrieval

Computational Linguistics

Natural Language Processing

LSA/Topic Model

Text Mining

LSA/Topic Model

Data Mining

Reductio

n

Dimension

Machine

Learning

EM

4

Machine

Translation

Aim:find the topic that a word or a document belongs to

Latent Factor Model

, Yueshen Xu

Application

LFM has been a fundamental technique in modern

search engine, recommender system, tag extraction,

blog clustering, twitter topic mining, news (text)

summarization, etc.

Search Engine PageRank How importantโ€ฆ.this web page?

LFM How relevanceโ€ฆ.this web page?

LFM How relevanceโ€ฆthe userโ€™s query

vs. one document?

Recommender System Opinion Extraction

Spam Detection

Tag Extraction

6/11/2014 5 Middleware, CCNT, ZJU

Text Summarization

Abstract Generation

Twitter Topic Mining

Text: Steven Jobs had left us for about two yearsโ€ฆ..the appleโ€™s price will fall

downโ€ฆ.

, Yueshen Xu

Famous Researcher

6/11/2014 6 Middleware, CCNT, ZJU

David Blei,

Princeton,

LDA

Chengxiang Zhai,

UIUC, Presidential

Early Career Award

W. Bruce Croft, UMA

Language Model

Bing Liu, UIC

Opinion Mining

John D. Lafferty,

CMU, CRF&IBM

Thomas Hofmann

Brown, pLSA

Andrew McCallum,

UMA, CRF&IBM

Susan Dumais,

Microsoft, LSI

, Yueshen Xu

Language Model

Unigram Language Model == Zero-order Markov Chain

Bigram Language Model == First-order Markov Chain

N-gram Language Model == (N-1)-order Markov Chain

Mixture-unigram Language Model

6/11/2014 Middleware, CCNT, ZJU

sw

i

i

MwpMwp )|()|(

Bag of Words(BoW)

No order, no grammar, only multiplicity

sw

ii

i

MwwpMwp )|()|( ,1

8

w

NM

w

NM

z๐‘ ๐’˜ =

๐‘ง

๐‘(๐‘ง)

๐‘›=1

๐‘

๐‘(๐‘ค๐‘›|๐‘ง)

, Yueshen Xu

9

Vector Space Model

A document is represented as a vector of identifier

Identifier

Boolean: 0, 1

Term Count: How many timesโ€ฆ

Term Frequency: How frequentโ€ฆin this document

TF-IDF: How importantโ€ฆin the corpus most used

Relevance Ranking

First used in SMART(Gerard Salton, Cornell)

6/11/2014 Middleware, CCNT, ZJU

),,,(

),,,(

21

21

tqqq

tjjjj

wwwq

wwwd

Gerard Salton

Award(SIGIR)

qd

qd

j

j

cos

, Yueshen Xu

TF-IDF

Mixture language model

Linear combination of a certain distribution(Gaussian)

Better Performance

TF: Term Frequency

IDF: Inversed Document Frequency

TF-IDF

6/11/2014 Middleware, CCNT, ZJU

kkj

ij

ijn

ntf Term i, document j, count of i in j

)|}:{|1

log(dtDd

Nidf

i

i

N documents in the corpus

iijjij idftfDdtidftf ),,(How important โ€ฆin this document

How important โ€ฆin this corpus

10, Yueshen Xu

Latent Semantic Indexing

Challenge

Compare document in the same concept space

Compare documents across languages

Synonymy, ex: buy - purchase, user - consumer

Polysemy, ex; book - book, draw - draw

Key Idea

Dimensionality reduction of word-document co-occurrence matrix

Construction of latent semantic space

6/11/2014 Middleware, CCNT, ZJU

Defects of VSM

Word Document

Word DocumentConcept

VSM

LSI

11, Yueshen Xu

Aspect

Topic

Latent

Factor

Singular Value Decomposition

LSI ~= SVD

U, V: orthogonal matrices

โˆ‘ :the diagonal matrix with the singular values of N

6/11/2014 Middleware, CCNT, ZJU12

TVUN

U

t * m

Document

Term

s

t * d

m* m m* d

N โˆ‘U V

k < m || k <<mCount, Frequency, TF-IDF

t * m

Document

Term

s

t * k

k* k m* d

U V N

word: Exchangeability

k < m || k <<m

k

, Yueshen Xu

Singular Value Decomposition

The K-largest singular values

Distinguish the variance between words and documents to a

greatest extent

Discarding the lowest dimensions

Reduce noise

Fill the matrix

Predict & Lower computational complexity

Enlarge the distinctiveness

Decomposition

Concept, semantic, topic (aspect)

6/11/2014 13 Middleware, CCNT, ZJU

(Probabilistic) Matrix Factorization/

Factorization Model: Analytic

solution of SVD

Unsupervised

Learning

, Yueshen Xu

Probabilistic Latent Semantic Indexing

pLSI Model

6/11/2014 14 Middleware, CCNT, ZJU

w1

w2

wN

z1

zK

z2

d1

d2

dM

โ€ฆ..

โ€ฆ..

โ€ฆ..

)(dp)|( dzp)|( zwp

Assumption

Pairs(d,w) are assumed to be

generated independently

Conditioned on z, w is generated

independently of d

Words in a document are

exchangeable

Documents are exchangeable

Latent topics z are independent

Generative Process/Model

ZzZz

zwpdzpdpdzwpdpdpdwpwdp )|()|()()|,()()()|(),(

Multinomial Distribution

Multinomial Distribution

One layer of โ€˜Deep

Neutral Networkโ€™

Global

Local

, Yueshen Xu

Probabilistic Latent Semantic Indexing

6/11/2014 15 Middleware, CCNT, ZJU

d z w

N

M

Zz

zwpdzpdwp )|()|()|(

Zz

ZzZz

zpzdpzwp

zdpzdwpzwdpdwp

)()|()|(

),(),|(),,(),(

d

z w

N

MThese are two ways to

formulate pLSA, which are

equivalent but lead to two

different inference processesEquivalent in Bayes Rule

Probabilistic

Graph Model

d:Exchangeability

Directed Acyclic

Graph (DAG)

, Yueshen Xu

Expectation-Maximization

EM is a general algorithm for maximum-likelihood estimation

(MLE) where the data are โ€˜incompleteโ€™ or contains latent

variables: pLSA, GMM, HMMโ€ฆ---Cross Domain

Deduction Process

ฮธ:parameter to be estimated; ฮธ0: initialize randomly; ฮธn: the current

value; ฮธn+1: the next value

6/11/2014 16 Middleware, CCNT, ZJU

)()(max1 nn LL

),|(log)( XpL )|,(log)( HXpLc Latent Variable

),|(log)(),|(log)|(log)|,(log)( XHpLXHpXpHXpLc

),|(

),|(log)()()()(

XHp

XHpLLLL

nn

cc

n

, Yueshen Xu

Objective:

Expectation-Maximization

6/11/2014 17 Middleware, CCNT, ZJU

),|(

),|(log),|(

),|()(),|()()()(

XHp

XHpXHp

XHpLXHpLLL

n

H

n

H

nn

c

H

n

c

n

K-L divergence: non-negativeKullback-Leibler Divergence, or Relative Entropy

H

nn

c

H

nn

c XHpLLXHpLL ),|()()(),|()()(

Lower Bound

H

n

ccXHp

n XHpLLEQ n ),|()()]([);(),|(

Q-function

E-step (expectation): Compute Q;

M-step(maximization): Re-estimate ฮธ by maximizing QConvergence

How is EM used in pLSA?

, Yueshen Xu

EM in pLSA

6/11/2014 18 Middleware, CCNT, ZJU

K

k

ikkjijk

N

i

M

j

ji

K

k

ikkj

N

i

M

j

jiijk

H

n

ccXHp

n

dzpzwpdwzpwdn

dzpzwpwdndwzp

XHpLLEQ n

11 1

1 1 1

),|(

))|()|(log(),|(),(

))|()|(log(),(),|(

),|()()]([);(

Posterior Random value in initialization

Likelyhood function

Constraints:

1.

2.

1)|(1

M

j

kjzwp

1)|(1

K

k

jkdzp

Lagrange

Multiplier

M

i

K

kiki

K

k

M

jkjkc dzpzwpLEH

1 11 1

))|(1())|(1(][

Partial derivative=0

independent

variable

independent

variable

M

m

N

i

imkim

N

i

ijkij

kj

dwzpdwn

dwzpdwn

zwp

1 1

1

),|(),(

),|(),(

)|()(

),|(),(

)|(1

i

M

j

ijkij

ikdn

dwzpdwn

dzp

M-Step

E-Step

K

l

illj

ikkj

K

l

illji

iikkj

ijk

dzpzwp

dzpzwp

dzpzwpdp

dpdzpzwpdwzp

1

1

)|()|(

)|()|(

)|()|()(

)()|()|(),|(

Associative

Law &

Distributive

Law

, Yueshen Xu

๐‘™๐‘œ๐‘” ๐‘(๐‘ค|๐‘‘)๐‘›(๐‘‘,๐‘ค)

Bayesian Theory v.s.

Probability Theory

Bayesian Theory v.s. Probability Theory

Estimate ๐œƒ through posterior v.s. Estimate ๐œƒ through the

maximization of likelihood

Bayesian theory prior v.s. Probability theory statistic

When the number of samples โ†’ โˆž, Bayesian theory == Probability

theory

Parameter Estimation

๐‘ ๐œƒ ๐ท โˆ ๐‘ ๐ท ๐œƒ ๐‘ ๐œƒ ๐‘ ๐œƒ ? Conjugate Prior likelihood is

helpful, but its function is limited Otherwise?

6/11/2014 19 Middleware, CCNT, ZJU

Non-parametric Bayesian Methods (Complicated)

Kernel methods: I just know a little...

VSM CF MF pLSA LDA Non-parametric Bayesian

Deep Learning

, Yueshen Xu

Latent Dirichlet Allocation

Latent Dirichlet Allocation (LDA) David M. Blei, Andrew Y. Ng, Michael I. Jordan

Journal of Machine Learning Research๏ผŒ2003, cited > 3000

Hierarchical Bayesian model; Bayesian pLSI

6/11/2014 20 Middleware, CCNT, ZJU

ฮธ z w

N

Mฮฑ

ฮฒ

Iterative times

Generative Process of a document d in a

corpus according to LDA

Choose N ~ Poisson(๐œ‰); Why?

For each document d={๐‘ค1, ๐‘ค2 โ€ฆ ๐‘ค๐‘›}

Choose ๐œƒ ~๐ท๐‘–๐‘Ÿ(๐›ผ); Why?

For each of the N words ๐‘ค๐‘› in d:

a) Choose a topic ๐‘ง๐‘›~๐‘€๐‘ข๐‘™๐‘ก๐‘–๐‘›๐‘œ๐‘š๐‘–๐‘›๐‘Ž๐‘™ ๐œƒ

Why?

b) Choose a word ๐‘ค๐‘› from ๐‘ ๐‘ค๐‘› ๐‘ง๐‘›, ๐›ฝ ,a multinomial probability conditioned on ๐‘ง๐‘›

Why

ACM-Infosys

Awards

, Yueshen Xu

Latent Dirichlet Allocation

LDA(Cont.)

6/11/2014 21 Middleware, CCNT, ZJU

ฮธ z w

N

Mฮฑ

๐œ‘

ฮฒ

Kฮฒ

Generative Process of a document d in LDA

Choose N ~ Poisson(๐œ‰); Not important

For each document d={๐‘ค1, ๐‘ค2 โ€ฆ ๐‘ค๐‘›}

Choose ๐œƒ ~๐ท๐‘–๐‘Ÿ(๐›ผ);๐œƒ = ๐œƒ1, ๐œƒ2 โ€ฆ ๐œƒ๐พ , ๐œƒ = ๐พ ,

K is fixed, 1๐พ ๐œƒ = 1, ๐ท๐‘–๐‘Ÿ~๐‘€๐‘ข๐‘™๐‘ก๐‘– โ†’๐ถ๐‘œ๐‘›๐‘—๐‘ข๐‘”๐‘Ž๐‘ก๐‘’

๐‘ƒ๐‘Ÿ๐‘–๐‘œ๐‘Ÿ

For each of the N words ๐‘ค๐‘› in d:

a) Choose a topic ๐‘ง๐‘›~๐‘€๐‘ข๐‘™๐‘ก๐‘–๐‘›๐‘œ๐‘š๐‘–๐‘›๐‘Ž๐‘™ ๐œƒ

b) Choose a word ๐‘ค๐‘› from ๐‘ ๐‘ค๐‘› ๐‘ง๐‘›, ๐›ฝ ,

a multinomial probability conditioned on

๐‘ง๐‘› one word one topic

one document multi-topics

๐œƒ = ๐œƒ1, ๐œƒ2 โ€ฆ ๐œƒ๐พ

z= ๐‘ง1, ๐‘ง2 โ€ฆ ๐‘ง๐พ

For each word ๐‘ค๐‘›there is a ๐‘ง๐‘›

pLSA: the number of p(z|d) is linear

to the number of documents

overfitting

Regularization

M+K Dirichlet-Multinomial

, Yueshen Xu

Latent Dirichlet Allocation

6/11/2014 22 Middleware, CCNT, ZJU, Yueshen Xu

Conjugate Prior &

Distributions

Conjugate Prior:

If the posterior p(ฮธ|x) are in the same family as the p(ฮธ), the prior

and posterior are called conjugate distributions, and the prior is

called a conjugate prior of the likelihood p(x|ฮธ) : p(ฮธ|x) โˆ p(x|ฮธ)p(ฮธ)

Distributions

Binomial Distribution โ†โ†’ Beta Distribution

Multinomial Distribution โ†โ†’ Dirichlet Distribution

Binomial & Beta Distribution

Binomial Bin(m|N,ฮธ)=C(m,N)ฮธm(1-ฮธ)N-m :likelihood

C(m,N)=N!/(N-m)!m!

Beta(ฮธ|a,b)

6/11/2014 23 Middleware, CCNT, ZJU

11- )1()()(

)(

ba

ba

ba

0

1)( dteta ta

Why do prior and

posterior need to be

conjugate distributions?

, Yueshen Xu

Conjugate Prior &

Distributions

6/11/2014 24 Middleware, CCNT, ZJU

11- )1()()(

)(

)1(),(),,,|(

ba

lm

ba

ba

lmmCbalmp

11- )1()()(

)(),,,|(

blam

blam

blambalmp

Beta Distribution!

Parameter Estimation

Multinomial & Dirichlet Distribution

x/ ๐‘ฅ is a multivariate, ex, ๐‘ฅ = (0,0,1,0,0,0): event of ๐‘ฅ3 happens

The probabilistic distribution of ๐‘ฅ in only one event : ๐‘ ๐‘ฅ ๐œƒ

= ๐‘˜=1๐พ ๐œƒ๐‘˜

๐‘ฅ๐‘˜, ๐œƒ = (๐œƒ1, ๐œƒ2 โ€ฆ , ๐œƒ๐‘˜)

, Yueshen Xu

Conjugate Prior &

Distributions

Multinomial & Dirichlet Distribution (Cont.)

Mult(๐‘š1, ๐‘š2, โ€ฆ , ๐‘š๐พ|๐œฝ, ๐‘)=๐‘!

๐‘š1!๐‘š2!โ€ฆ๐‘š๐พ!๐ถ๐‘

๐‘š1๐ถ๐‘โˆ’๐‘š1

๐‘š2 ๐ถ๐‘โˆ’๐‘š1โˆ’๐‘š2

๐‘š3 โ€ฆ

๐ถ๐‘โˆ’ ๐‘˜=1

๐พโˆ’1 ๐‘š๐‘˜

๐‘š๐พ ๐‘˜=1๐พ ๐œƒ๐‘˜

๐‘ฅ๐‘˜: the likelihood function of ๐œƒ

6/11/2014 25 Middleware, CCNT, ZJU

Mult: The exact probabilistic distribution of ๐‘ ๐‘ง๐‘˜ ๐‘‘๐‘— and ๐‘ ๐‘ค๐‘— ๐‘ง๐‘˜

In Bayesian theory, we need to find a conjugate prior of ๐œƒ for

Mult, where 0 < ๐œƒ < 1, ๐‘˜=1๐พ ๐œƒ๐‘˜ = 1

Dirichlet Distribution

๐ท๐‘–๐‘Ÿ ๐œƒ ๐œถ =ฮ“(๐›ผ0)

ฮ“ ๐›ผ1 โ€ฆ ฮ“ ๐›ผ๐พ

๐‘˜=1

๐พ

๐œƒ๐‘˜๐›ผ๐‘˜โˆ’1

a vector

Hyper-parameter: parameter in

probabilistic distribution function (pdf), Yueshen Xu

Conjugate Prior &

Distributions

Multinomial & Dirichlet Distribution (Cont.)

๐‘ ๐œƒ ๐’Ž, ๐œถ โˆ ๐‘ ๐’Ž ๐œƒ ๐‘(๐œƒ|๐œถ) โˆ ๐‘˜=1๐พ ๐œƒ๐‘˜

๐›ผ๐‘˜+๐‘š๐‘˜โˆ’1

6/11/2014 26 Middleware, CCNT, ZJU

Dirichlet?

๐‘ ๐œƒ ๐’Ž, ๐œถ =๐ท๐‘–๐‘Ÿ ๐œƒ ๐’Ž + ๐œถ =ฮ“(๐›ผ0+๐‘)

ฮ“ ๐›ผ1+๐‘š1 โ€ฆฮ“ ๐›ผ๐พ+๐‘š๐พ ๐‘˜=1

๐พ ๐œƒ๐‘˜๐›ผ๐‘˜+๐‘š๐‘˜โˆ’1

Why? Gamma ฮ“ is a mysterious function

Dirichlet!

๐‘~๐ต๐‘’๐‘ก๐‘Ž ๐‘ก ๐›ผ, ๐›ฝ ๐ธ ๐‘ = 0

1๐‘ก ร—

ฮ“ ๐›ผ+๐›ฝ

ฮ“ ๐›ผ ฮ“ ๐›ฝ๐‘ก๐›ผโˆ’1(1 โˆ’ ๐‘ก)๐›ฝโˆ’1๐‘‘๐‘ก =

๐›ผ

๐›ผ+๐›ฝ

๐‘~๐ท๐‘–๐‘Ÿ ๐œƒ ๐›ผ ๐ธ ๐‘ =๐›ผ1

๐‘–=1๐พ ๐›ผ๐‘–

,๐›ผ2

๐‘–=1๐พ ๐›ผ๐‘–

, โ€ฆ ,๐›ผ๐พ

๐‘–=1๐พ ๐›ผ๐‘–

, Yueshen Xu

Poisson Distribution

Why Poisson distribution?

The number of births per hour during a given day; the number of

particles emitted by a radioactive source in a given time; the number

of cases of a disease in different towns

For Bin(n,p), when n is large, and p is small p(X=k)โ‰ˆ๐œ‰๐‘˜๐‘’โˆ’๐œ‰

๐‘˜!, ๐œ‰ โ‰ˆ ๐‘›๐‘

๐บ๐‘Ž๐‘š๐‘š๐‘Ž ๐‘ฅ ๐›ผ =๐‘ฅ๐›ผโˆ’1๐‘’โˆ’๐‘ฅ

ฮ“(๐›ผ)๐บ๐‘Ž๐‘š๐‘š๐‘Ž ๐‘ฅ ๐›ผ = ๐‘˜ + 1 =

๐‘ฅ๐‘˜๐‘’โˆ’๐‘ฅ

๐‘˜!(ฮ“ ๐‘˜ + 1 = ๐‘˜!)

(Poisson discrete; Gamma continuous)

6/11/2014 27 Middleware, CCNT, ZJU

Poisson Distribution

๐‘ ๐‘˜|๐œ‰ =๐œ‰๐‘˜๐‘’โˆ’๐œ‰

๐‘˜!

Many experimental situations occur in which we observe the

counts of events within a set unit of time, area, volume, length .etc

, Yueshen Xu

Solution for LDA

LDA(Cont.) ๐›ผ, ๐›ฝ: corpus-level parameters

๐œƒ: document-level variable

z, w:word-level variables

Conditionally independent hierarchical models

Parametric Bayes model

6/11/2014 28 Middleware, CCNT, ZJU

knkk ppp

ppp

ppp

21

n22221

n11211๐‘ง1

๐‘ง2

๐‘ง๐พ

๐‘ค1

๐‘ง1 ๐‘ง2 ๐‘ง๐‘›

๐‘ค2 ๐‘ค๐‘›

p ๐œƒ, ๐’›, ๐’˜ ๐›ผ, ๐›ฝ = ๐‘(๐œƒ|๐›ผ)

๐‘›=1

๐‘

๐‘ ๐‘ง๐‘› ๐œƒ ๐‘(๐‘ค๐‘›|๐‘ง๐‘›, ๐›ฝ)

Solving Process

(๐‘ ๐‘ง๐‘– ๐œฝ = ๐œƒ๐‘–)

p ๐’˜ ๐›ผ, ๐›ฝ = ๐‘(๐œƒ|๐›ผ)

๐‘›=1

๐‘

๐‘ง๐‘›

๐‘ ๐‘ง๐‘› ๐œƒ ๐‘(๐‘ค๐‘›|๐‘ง๐‘›, ๐›ฝ) ๐‘‘๐œƒ

multiple integral

p ๐‘ซ ๐›ผ, ๐›ฝ =

๐‘‘=1

๐‘€

๐‘(๐œƒ๐‘‘|๐›ผ)

๐‘›=1

๐‘๐‘‘

๐‘ง๐‘‘๐‘›

๐‘ ๐‘ง๐‘‘๐‘› ๐œƒ๐‘‘ ๐‘(๐‘ค๐‘‘๐‘›|๐‘ง๐‘‘๐‘›, ๐›ฝ) ๐‘‘๐œƒd

๐›ฝ

, Yueshen Xu

Solution for LDA

6/11/2014 29 Middleware, CCNT, ZJU

The most significant generative model in Machine Learning Community in the

recent ten years

๐‘ ๐’˜ ๐›ผ, ๐›ฝ =ฮ“( ๐‘– ๐›ผ๐‘–)

๐‘– ฮ“(๐›ผ๐‘–)

๐‘–=1

๐‘˜

๐œƒ๐‘–๐›ผ๐‘–โˆ’1

๐‘›=1

๐‘

๐‘–=1

๐‘˜

๐‘—=1

๐‘‰

(๐œƒ๐‘–๐›ฝ๐‘–๐‘—)๐‘ค๐‘›

๐‘—

๐‘‘๐œƒ

p ๐’˜ ๐›ผ, ๐›ฝ = ๐‘(๐œƒ|๐›ผ)

๐‘›=1

๐‘

๐‘ง๐‘›

๐‘ ๐‘ง๐‘› ๐œƒ ๐‘(๐‘ค๐‘›|๐‘ง๐‘›, ๐›ฝ) ๐‘‘๐œƒRewrite in terms of

model parameters

๐›ผ = ๐›ผ1, ๐›ผ2, โ€ฆ ๐›ผ๐พ ; ๐›ฝ โˆˆ ๐‘…๐พร—๐‘‰:What we need to solve out

Variational Inference Gibbs Sampling

Deterministic Inference Stochastic Inference

Why variational inference?Simplify the dependency structure

Why sampling? Approximate the

statistical properties of the population

with those of samplesโ€™

, Yueshen Xu

Variational Inference

Variational Inference (Inference through a variational

distribution), VI

VI aims to use an approximating distribution that has a simpler

dependency structure than that of the exact posterior distribution

6/11/2014 30 Middleware, CCNT, ZJU

๐‘ƒ(๐ป|๐ท) โ‰ˆ ๐‘„(๐ป)

true posterior distribution

variational distributionDissimilarity between

P and Q?Kullback-Leibler

Divergence

๐พ๐ฟ(๐‘„| ๐‘ƒ = ๐‘„ ๐ป ๐‘™๐‘œ๐‘”๐‘„ ๐ป ๐‘ƒ ๐ท

๐‘ƒ ๐ป, ๐ท๐‘‘๐ป

= ๐‘„ ๐ป ๐‘™๐‘œ๐‘”๐‘„ ๐ป

๐‘ƒ ๐ป, ๐ท๐‘‘๐ป + ๐‘™๐‘œ๐‘”๐‘ƒ(๐ท)

๐ฟ๐‘‘๐‘’๐‘“

๐‘„ ๐ป ๐‘™๐‘œ๐‘”๐‘ƒ ๐ป, ๐ท ๐‘‘๐ป โˆ’ ๐‘„ ๐ป ๐‘™๐‘œ๐‘”๐‘„ ๐ป ๐‘‘๐ป =< ๐‘™๐‘œ๐‘”๐‘ƒ(๐ป, ๐ท) >Q(H) +โ„ ๐‘„

Entropy of Q

, Yueshen Xu

Variational Inference

6/11/2014 31 Middleware, CCNT, ZJU

๐‘ƒ ๐ป ๐ท = ๐‘ ๐œƒ, ๐‘ง ๐’˜, ๐›ผ, ๐›ฝ , ๐‘„ ๐ป = ๐‘ž ๐œƒ, ๐‘ง ๐›พ, ๐œ™ = ๐‘ž ๐œƒ ๐›พ ๐‘ž ๐‘ง ๐œ™

= ๐‘ž(๐œƒ|๐›พ) ๐‘›=1๐‘ ๐‘ž(๐‘ง๐‘›|๐œ™๐‘›)

๐›พโˆ—, ๐œ™โˆ— = arg min(๐ท(๐‘ž ๐œƒ, ๐‘ง ๐›พ, ๐œ™ ||๐‘ ๐œƒ, ๐‘ง ๐’˜, ๐›ผ, ๐›ฝ ))๏ผšbut we donโ€™t

know the exact analytical form of the above KL

log ๐‘ ๐‘ค ๐›ผ, ๐›ฝ = ๐‘™๐‘œ๐‘”

๐‘ง

๐‘ ๐œƒ, ๐‘ง, ๐‘ค ๐›ผ, ๐›ฝ ๐‘‘๐œƒ

= ๐‘™๐‘œ๐‘”

๐‘ง

๐‘ ๐œƒ, ๐‘ง, ๐‘ค ๐›ผ, ๐›ฝ ๐‘ž(๐œƒ, ๐‘ง)

๐‘ž(๐œƒ, ๐‘ง)๐‘‘๐œƒ

โ‰ฅ

๐‘ง

๐‘ž ๐œƒ, ๐‘ง ๐‘™๐‘œ๐‘”๐‘ ๐œƒ, ๐‘ง, ๐‘ค ๐›ผ, ๐›ฝ

๐‘ž(๐œƒ, ๐‘ง)๐‘‘๐œƒ

= ๐ธ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐œƒ, ๐‘ง, ๐‘ค ๐›ผ, ๐›ฝ โˆ’ ๐ธ๐‘ž ๐‘™๐‘œ๐‘”๐‘ž ๐œƒ, ๐‘ง = ๐ฟ(๐›พ, ๐œ™; ๐›ผ, ๐›ฝ)

log ๐‘ ๐‘ค ๐›ผ, ๐›ฝ = ๐ฟ ๐›พ, ๐œ™; ๐›ผ, ๐›ฝ + KL minimize KL == maximize L

๐œƒ ,z: independent (approximately)

for facilitating computation

, Yueshen Xu

variational distribution

Variational Inference

6/11/2014 32 Middleware, CCNT, ZJU

๐ฟ ๐›พ, ๐œ™; ๐›ผ, ๐›ฝ = ๐ธ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐œƒ ๐›ผ + ๐ธ๐‘ž๐‘™๐‘œ๐‘”๐‘ ๐‘ง ๐œƒ + ๐ธ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐‘ค ๐‘ง, ๐›ฝ โˆ’

๐ธ๐‘ž ๐‘™๐‘œ๐‘”๐‘ž ๐œƒ โˆ’ ๐ธ๐‘ž[๐‘™๐‘œ๐‘”๐‘ž(๐‘ง)]

๐ธ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐œƒ ๐›ผ

=

๐‘–=1

๐พ

๐›ผ๐‘– โˆ’ 1 ๐ธ๐‘ž ๐‘™๐‘œ๐‘”๐œƒ๐‘– + ๐‘™๐‘œ๐‘”ฮ“

๐‘–=1

๐พ

๐›ผ๐‘– โˆ’

๐‘–=1

๐พ

๐‘™๐‘œ๐‘”ฮ“(๐›ผ๐‘–)

๐ธ๐‘ž ๐‘™๐‘œ๐‘”๐œƒ๐‘– = ๐œ“ ๐›พ๐‘– โˆ’ ๐œ“(

๐‘—=1

๐พ

๐›พ๐‘—)

๐ธ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐‘ง ๐œƒ =

๐‘›=1

๐‘

๐‘–=1

๐พ

๐ธ๐‘ž[๐‘ง๐‘›๐‘–] ๐ธ๐‘ž ๐‘™๐‘œ๐‘”๐œƒ๐‘– =

๐‘›=1

๐‘

๐‘–=1

๐พ

๐œ™๐‘›๐‘–(๐œ“ ๐›พ๐‘– โˆ’ ๐œ“(

๐‘—=1

๐พ

๐›พ๐‘—) )

๐ธ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐‘ค ๐‘ง, ๐›ฝ =

๐‘›=1

๐‘

๐‘–=1

๐พ

๐‘—=1

๐‘‰

๐ธ๐‘ž[๐‘ง๐‘›๐‘–] ๐‘ค๐‘›๐‘—๐‘™๐‘œ๐‘”๐›ฝ๐‘–๐‘— =

๐‘›=1

๐‘

๐‘–=1

๐พ

๐‘—=1

๐‘‰

๐œ™๐‘›๐‘– ๐‘ค๐‘›๐‘—๐‘™๐‘œ๐‘”๐›ฝ๐‘–๐‘—

, Yueshen Xu

Variational Inference

6/11/2014 33 Middleware, CCNT, ZJU

๐ธ๐‘ž ๐‘™๐‘œ๐‘”๐‘ž ๐œƒ ๐›พ is much like ๐ธ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐œƒ ๐›ผ

๐ธ๐‘ž ๐‘™๐‘œ๐‘”๐‘ž ๐‘ง ๐œ™ = ๐ธ๐‘ž

๐‘›=1

๐‘

๐‘–=1

๐‘˜

๐‘ง๐‘›๐‘–๐‘™๐‘œ๐‘” ๐œ™๐‘›๐‘–

Maximize L with respect to ๐œ™๐‘›๐‘–:

๐ฟ๐œ™๐‘›๐‘–= ๐œ™๐‘›๐‘–(๐œ“ ๐›พ๐‘– โˆ’ ๐œ“( ๐‘—=1

๐พ ๐›พ๐‘—))+๐œ™๐‘›๐‘–๐‘™๐‘œ๐‘”๐›ฝ๐‘–๐‘—-๐œ™๐‘›๐‘–log๐œ™๐‘›๐‘– + ๐œ†( ๐‘—=1๐พ ๐œ™๐‘›๐‘– โˆ’ 1)

Lagrangian Multiplier

Taking derivatives with respect to ๐œ™๐‘›๐‘–:๐œ•๐ฟ

๐œ•๐œ™๐‘›๐‘–= (๐œ“ ๐›พ๐‘– โˆ’ ๐œ“( ๐‘—=1

๐พ ๐›พ๐‘—))+๐‘™๐‘œ๐‘”๐›ฝ๐‘–๐‘—-log๐œ™๐‘›๐‘– โˆ’ 1 + ๐œ†=0

๐œ™๐‘›๐‘– โˆ ๐›ฝ๐‘–๐‘—exp(๐œ“ ๐›พ๐‘– โˆ’ ๐œ“

๐‘—=1

๐พ

๐›พ๐‘— )

, Yueshen Xu

Variational Inference

You can refer to more in the original paper.

Variational EM Algorithm

Aim: (๐›ผโˆ—, ๐›ฝ

โˆ—)=arg max ๐‘‘=1

๐‘€ ๐‘ ๐’˜|๐›ผ, ๐›ฝ

Initialize ๐›ผ, ๐›ฝ

E-Step: compute ๐›ผ, ๐›ฝ through variational inference for likelihood

approximation

M-Step: Maximize the likelihood according to ๐›ผ, ๐›ฝ

End until convergence

6/11/2014 34 Middleware, CCNT, ZJU, Yueshen Xu

Markov Chain Monte Carlo

MCMC Basic: Markov Chain (First-order) Stationary

Distribution Fundament of Gibbs Sampling

General: ๐‘ƒ ๐‘‹๐‘ก+๐‘› = ๐‘ฅ ๐‘‹1, ๐‘‹2, โ€ฆ ๐‘‹๐‘ก = ๐‘ƒ(๐‘‹๐‘ก+๐‘› = ๐‘ฅ|๐‘‹๐‘ก)

First-Order: ๐‘ƒ ๐‘‹๐‘ก+1 = ๐‘ฅ ๐‘‹1, ๐‘‹2, โ€ฆ ๐‘‹๐‘ก = ๐‘ƒ(๐‘‹๐‘ก+1 = ๐‘ฅ|๐‘‹๐‘ก)

One-step transition probabilistic matrix

6/11/2014 35 Middleware, CCNT, ZJU

|)||(|...)2|(|)1|(|

)12(p...)22(p)12(p

|)|1(...)21()11(p

SSpSpSp

Spp

P

Xm

Xm+1

, Yueshen Xu

Markov Chain Monte Carlo

Markov Chain

Initialization probability: ๐œ‹0 = {๐œ‹0 1 , ๐œ‹0 2 , โ€ฆ , ๐œ‹0(|๐‘†|)}

๐œ‹๐‘› = ๐œ‹๐‘›โˆ’1๐‘ƒ = ๐œ‹๐‘›โˆ’2๐‘ƒ2 = โ‹ฏ = ๐œ‹0๐‘ƒ๐‘›: Chapman-Kolomogrov equation

Central-limit Theorem: Under the premise of connectivity of P, lim๐‘›โ†’โˆž

๐‘ƒ๐‘–๐‘—๐‘›

= ๐œ‹ ๐‘— ; ๐œ‹ ๐‘— = ๐‘–=1|๐‘†|

๐œ‹ ๐‘– ๐‘ƒ๐‘–๐‘—

lim๐‘›โ†’โˆž

๐œ‹0๐‘ƒ๐‘› =๐œ‹(1) โ€ฆ ๐œ‹(|๐‘†|)

โ‹ฎ โ‹ฎ โ‹ฎ๐œ‹(1) ๐œ‹(|๐‘†|)

๐œ‹ = {๐œ‹ 1 , ๐œ‹ 2 , โ€ฆ , ๐œ‹ ๐‘— , โ€ฆ , ๐œ‹(|๐‘†|)}

6/11/2014 36 Middleware, CCNT, ZJU

Stationary Distribution

๐‘‹0~๐œ‹0 ๐‘ฅ โˆ’โ†’ ๐‘‹1~๐œ‹1 ๐‘ฅ โˆ’โ†’ โ‹ฏ โˆ’โ†’ ๐‘‹๐‘›~๐œ‹ ๐‘ฅ โˆ’โ†’ ๐‘‹๐‘›+1~๐œ‹ ๐‘ฅ โˆ’โ†’ ๐‘‹๐‘›+2~๐œ‹ ๐‘ฅ โˆ’โ†’

sample Convergence

Stationary Distribution

, Yueshen Xu

Markov Chain Monte Carlo

MCMC Sampling

We should construct the relationship between ๐œ‹(๐‘ฅ) and MC

transition process Detailed Balance Condition

In a common MC, if for ๐… ๐’™ , ๐‘ƒ ๐‘ก๐‘Ÿ๐‘Ž๐‘›๐‘ ๐‘–๐‘ก๐‘–๐‘œ๐‘› ๐‘š๐‘Ž๐‘ก๐‘Ÿ๐‘–๐‘ฅ , ๐œ‹ ๐‘– ๐‘ƒ๐‘–๐‘— = ๐œ‹(j)

๐‘ƒ๐‘—๐‘– , ๐‘“๐‘œ๐‘Ÿ ๐‘Ž๐‘™๐‘™ ๐‘–, ๐‘— ๐œ‹(๐‘ฅ) is the stationary distribution of this MC

Prove: ๐‘–=1โˆž ๐œ‹ ๐‘– ๐‘ƒ๐‘–๐‘— = ๐‘–=1

โˆž ๐œ‹ ๐‘— ๐‘ƒ๐‘—๐‘– = ๐œ‹ ๐‘— โˆ’โ†’ ๐œ‹๐‘ƒ = ๐œ‹๐œ‹ is the

solution of the equation ๐œ‹๐‘ƒ = ๐œ‹ Done

For a common MC(q(i,j), q(j|i), q(ij)), and for any probabilistic

distribution p(x) (the dimension of x is arbitrary) Transformation

6/11/2014 37 Middleware, CCNT, ZJU

๐‘ ๐‘– ๐‘ž ๐‘–, ๐‘— ๐›ผ ๐‘–, ๐‘— = ๐‘ ๐‘— ๐‘ž(๐‘—, ๐‘–)๐›ผ(๐‘—, ๐‘–)

Qโ€™(i,j) Qโ€™(j,i)

๐›ผ ๐‘–, ๐‘— = ๐‘ ๐‘— ๐‘ž(๐‘—, ๐‘–),๐›ผ ๐‘—, ๐‘– = ๐‘ ๐‘– ๐‘ž(๐‘—, ๐‘–),

necessary condition

, Yueshen Xu

Markov Chain Monte Carlo

MCMC Sampling(cont.)

Step1: Initialize: ๐‘‹0 = ๐‘ฅ0

Step2: for t = 0, 1, 2, โ€ฆ

๐‘‹๐‘ก = ๐‘ฅ๐‘ก , ๐‘ ๐‘Ž๐‘š๐‘๐‘™๐‘’ ๐‘ฆ ๐‘“๐‘Ÿ๐‘œ๐‘š ๐‘ž(๐‘ฅ|๐‘ฅ๐‘ก) (๐‘ฆ โˆˆ ๐ท๐‘œ๐‘š๐‘Ž๐‘–๐‘› ๐‘œ๐‘“ ๐ท๐‘’๐‘“๐‘–๐‘›๐‘–๐‘ก๐‘–๐‘œ๐‘›)

sample u from Uniform[0,1]

If ๐‘ข < ๐›ผ ๐‘ฅ๐‘ก, ๐‘ฆ = ๐‘ ๐‘ฆ ๐‘ž ๐‘ฅ๐‘ก ๐‘ฆ โ‡’ ๐‘ฅ๐‘ก โ†’ ๐‘ฆ, Xt+1 = y

else Xt+1 = xt

6/11/2014 38 Middleware, CCNT, ZJU

Metropolis-Hastings Sampling

Step1: Initialize: ๐‘‹0 = ๐‘ฅ0

Step2: for t = 0, 1, 2, โ€ฆn, n+1, n+2โ€ฆ

๐‘‹๐‘ก = ๐‘ฅ๐‘ก , ๐‘ ๐‘Ž๐‘š๐‘๐‘™๐‘’ ๐‘ฆ ๐‘“๐‘Ÿ๐‘œ๐‘š ๐‘ž ๐‘ฅ ๐‘ฅ๐‘ก ๐‘ฆ โˆˆ ๐ท๐‘œ๐‘š๐‘Ž๐‘–๐‘› ๐‘œ๐‘“ ๐ท๐‘’๐‘“๐‘–๐‘›๐‘–๐‘ก๐‘–on

Burn-in PeriodConvergence

, Yueshen Xu

Gibbs Sampling

sample u from Uniform[0,1]

If ๐‘ข < ๐›ผ ๐‘ฅ๐‘ก, ๐‘ฆ = ๐‘š๐‘–๐‘›{๐‘ ๐‘ฆ ๐‘ž ๐‘ฅ๐‘ก ๐‘ฆ๐‘ ๐‘ฅ

๐‘ก๐‘ž ๐‘ฆ ๐‘ฅ๐‘ก

, 1} โ‡’ ๐‘ฅ๐‘ก โ†’ ๐‘ฆ , Xt+1 = y

else Xt+1 = xt

6/11/2014 39 Middleware, CCNT, ZJU

Not suitable with regard to high dimensional variables

Gibbs Sampling(Two Dimensions,(x1,y1))

A(x1,y1), B(x1,y2) ๐‘ ๐‘ฅ1, ๐‘ฆ1 ๐‘ ๐‘ฆ2 ๐‘ฅ1 = ๐‘ ๐‘ฅ1 ๐‘ ๐‘ฆ1 ๐‘ฅ1 ๐‘(๐‘ฆ2|๐‘ฅ1)

๐‘ ๐‘ฅ1, ๐‘ฆ2 ๐‘ ๐‘ฆ1 ๐‘ฅ1 = ๐‘ ๐‘ฅ1 ๐‘ ๐‘ฆ2 ๐‘ฅ1 ๐‘(๐‘ฆ1|๐‘ฅ1)

๐‘ ๐‘ฅ1, ๐‘ฆ1 ๐‘ ๐‘ฆ2 ๐‘ฅ1 = ๐‘ ๐‘ฅ1, ๐‘ฆ2 ๐‘ ๐‘ฆ1 ๐‘ฅ1

๐‘ ๐ด ๐‘ ๐‘ฆ2 ๐‘ฅ1 = ๐‘ ๐ต ๐‘ ๐‘ฆ1 ๐‘ฅ1

A(x1,y1)

B(x1,y2)

C(x2,y1)

D

๐‘ ๐ด ๐‘ ๐‘ฅ2 ๐‘ฆ1 = ๐‘ ๐ถ ๐‘ ๐‘ฅ1 ๐‘ฆ1

, Yueshen Xu

Gibbs Sampling

Gibbs Sampling(Cont.)

We can construct the transition probabilistic matrix Q accordingly

๐‘„ ๐ด โ†’ ๐ต = ๐‘(๐‘ฆ๐ต|๐‘ฅ1), if ๐‘ฅ๐ด = ๐‘ฅ๐ต = ๐‘ฅ1

๐‘„ ๐ด โ†’ ๐ถ = ๐‘(๐‘ฅ๐ถ|๐‘ฆ1), if ๐‘ฆ๐ด = ๐‘ฆ๐ถ = ๐‘ฆ1

๐‘„ ๐ด โ†’ ๐ท = 0, else

6/11/2014 40 Middleware, CCNT, ZJU

A(x1,y1)

B(x1,y2)

C(x2,y1)

D

Detailed Balance Condition:

๐‘ ๐‘‹ ๐‘„ ๐‘‹ โ†’ ๐‘Œ = ๐‘ ๐‘Œ ๐‘„(๐‘Œ โ†’ ๐‘‹) โˆš

Gibbs Sampling(in two dimension)

Step1: Initialize: ๐‘‹0 = ๐‘ฅ0, ๐‘Œ0 = ๐‘ฆ0

Step2: for t = 0, 1, 2, โ€ฆ

1. ๐‘ฆ๐‘ก+1~๐‘ ๐‘ฆ ๐‘ฅ๐‘ก ;

. 2. ๐‘ฅ๐‘ก+1~๐‘ ๐‘ฅ ๐‘ฆ๐‘ก+1

, Yueshen Xu

Gibbs Sampling

6/11/2014 41 Middleware, CCNT, ZJU

Gibbs Sampling(in two dimension)

Step1: Initialize: ๐‘‹0 = ๐‘ฅ0 = {๐‘ฅ1: ๐‘– = 1,2, โ€ฆ ๐‘›}

Step2: for t = 0, 1, 2, โ€ฆ

1. ๐‘ฅ1(๐‘ก+1)

~๐‘ ๐‘ฅ1 ๐‘ฅ2(๐‘ก)

, ๐‘ฅ3(๐‘ก)

, โ€ฆ , ๐‘ฅ๐‘›(๐‘ก)

;

2. ๐‘ฅ2๐‘ก+1~๐‘ ๐‘ฅ2 ๐‘ฅ1

(๐‘ก+1), ๐‘ฅ3

(๐‘ก), โ€ฆ , ๐‘ฅ๐‘›

(๐‘ก)

3. โ€ฆ

4. ๐‘ฅ๐‘—๐‘ก+1~๐‘ ๐‘ฅ๐‘— ๐‘ฅ1

(๐‘ก+1), ๐‘ฅ๐‘—โˆ’1

(๐‘ก+1), ๐‘ฅ๐‘—+1

(๐‘ก)โ€ฆ , ๐‘ฅ๐‘›

(๐‘ก)

5. โ€ฆ

6. ๐‘ฅ๐‘›๐‘ก+1~๐‘ ๐‘ฅ๐‘› ๐‘ฅ1

(๐‘ก+1), ๐‘ฅ2

(๐‘ก+1), โ€ฆ , ๐‘ฅ๐‘›โˆ’1

(๐‘ก+1)

t+1 t

, Yueshen Xu

Gibbs Sampling for LDA

Gibbs Sampling in LDA

Dir ๐‘ ๐›ผ =1

ฮ”(๐›ผ) ๐‘˜=1

๐‘‰ ๐‘๐‘˜๐›ผ๐‘˜โˆ’1

, ฮ”( ๐›ผ) is the normalization factor:

ฮ” ๐›ผ = ๐‘˜=1๐‘‰ ๐‘๐‘˜

๐›ผ๐‘˜โˆ’1๐‘‘ ๐‘

๐‘ ๐‘ง๐‘š ๐›ผ = ๐‘ ๐‘ง๐‘š ๐œƒ ๐‘ ๐œƒ ๐›ผ ๐‘‘ ๐‘ = ๐‘˜=1

๐‘‰ ๐œƒ๐‘˜๐‘›๐‘˜Dir( ๐œƒ| ๐›ผ) ๐‘‘ ๐œƒ

= ๐‘˜=1๐‘‰ ๐œƒ๐‘˜

๐‘›๐‘˜ 1

ฮ”(๐›ผ) ๐‘˜=1

๐‘‰ ๐œƒ๐‘˜๐›ผ๐‘˜โˆ’1

๐‘‘ ๐œƒ

= 1

ฮ”(๐›ผ) ๐‘˜=1

๐‘‰ ๐œƒ๐‘˜๐‘›๐‘˜+๐›ผ๐‘˜โˆ’1

๐‘‘ ๐œƒ =ฮ”(๐‘›๐‘š+๐›ผ)

ฮ”(๐›ผ)

6/11/2014 42 Middleware, CCNT, ZJU

๐‘ ๐’› ๐›ผ = ๐‘š=1๐‘€ ๐‘ ๐‘ง๐‘š ๐›ผ = ๐‘š=1

๐‘€ ฮ”(๐‘›๐‘š+๐›ผ)

ฮ”(๐›ผ)โˆ’โ†’

๐‘ ๐’˜, ๐’› ๐›ผ, ๐›ฝ = ๐‘˜=1๐พ ฮ”(๐‘›๐‘˜+๐›ฝ)

ฮ”(๐›ฝ) ๐‘š=1

๐‘€ ฮ”(๐‘›๐‘š+๐›ผ)

ฮ”(๐›ผ)

, Yueshen Xu

Gibbs Sampling for LDA

Gibbs Sampling in LDA

๐‘ ๐œƒ๐‘š ๐‘งยฌ๐‘–,๐‘คยฌ๐‘– = ๐ท๐‘–๐‘Ÿ(๐œƒ๐‘š|๐‘›๐‘š,ยฌ๐‘– + ๐›ผ), ๐‘ ๐œ‘๐‘˜ ๐‘งยฌ๐‘–,๐‘คยฌ๐‘– =

๐ท๐‘–๐‘Ÿ(๐œ‘๐‘˜|๐‘›๐‘˜,ยฌ๐‘– + ๐›ฝ)

๐‘(๐‘ง๐‘– = ๐‘˜| ๐‘งยฌ๐‘–,๐‘คยฌ๐‘–) โˆ ๐‘ ๐‘ง๐‘– = ๐‘˜, ๐‘ค๐‘– = ๐‘ก, ๐œƒ๐‘š, ๐œ‘๐‘˜ ๐‘งยฌ๐‘–,๐‘คยฌ๐‘– = ๐ธ ๐œƒ๐‘š๐‘˜ โˆ™

๐ธ ๐œ‘๐‘˜๐‘ก = ๐œƒ๐‘š๐‘˜ โˆ™ ๐œ‘๐‘˜๐‘ก

๐œƒ๐‘š๐‘˜=๐‘›๐‘š,ยฌ๐‘–

(๐‘ก)+๐›ผ๐‘˜

๐‘˜=1๐พ (๐‘›

๐‘š,ยฌ๐‘–(๐‘˜)

+๐›ผ๐‘˜), ๐œ‘๐‘˜๐‘ก=

๐‘›๐‘˜,ยฌ๐‘–(๐‘ก)

+๐›ฝ๐‘˜

๐‘ก=1๐‘‰ (๐‘›

๐‘˜,ยฌ๐‘–(๐‘ก)

+๐›ฝ๐‘˜)

๐‘(๐‘ง๐‘– = ๐‘˜| ๐‘งยฌ๐‘–,๐‘ค) โˆ๐‘›๐‘š,ยฌ๐‘–

(๐‘ก)+๐›ผ๐‘˜

๐‘˜=1๐พ (๐‘›๐‘š,ยฌ๐‘–

(๐‘˜)+๐›ผ๐‘˜)

ร—๐‘›๐‘˜,ยฌ๐‘–

(๐‘ก)+๐›ฝ๐‘˜

๐‘ก=1๐‘‰ (๐‘›๐‘˜,ยฌ๐‘–

(๐‘ก)+๐›ฝ๐‘˜)

๐‘ง๐‘–(๐‘ก+1)

~ ๐‘(๐‘ง๐‘– = ๐‘˜| ๐‘งยฌ๐‘–,๐‘ค), i=1โ€ฆK

6/11/2014 43 Middleware, CCNT, ZJU, Yueshen Xu

Q&A

6/11/2014 Middleware, CCNT, ZJU44, Yueshen Xu