PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML...

Post on 11-Feb-2018

216 views 0 download

Transcript of PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML...

Seoul National University

PRML을위한기초확률이론(Basic Probability Theory for PRML)

Yung-Kyun Noh (노영균)Seoul National University

패턴인식및기계학습겨울학교(Pattern Recognition and Machine Learning Winter School)

2016. 1. 20 (Wed.)

Seoul National University

Contents• Probability / Probability density

• Conditional probability (density)

• Marginal probability (density)• Joint probability (density)• Inference and classification• Gaussian Processes

2

Seoul National University

• Mapping from a random variable to a number

Probability

3

…0 1

Event 1 Event 2 Event 3 Event N…

Seoul National University

Probability

4

if

: random variable : set of outputs of random variables

Seoul National University

Probability and Probability Density

5

RD

Probability =

Seoul National University

Probability and Probability Density

6

Event is defined infinitesimally:

R1

DR2

: set of infinitesimal events

Seoul National University

Can you explain the meaning of these functions?

7

Compare with

Seoul National University

Supervised Learning (Prediction)

8

• Method: – Learning from

examples and can classify an unseen data

label

classifierNew unseen data (a horse)

[Based on the assumption of regularity]

car car

planeship

horse

horse horsecar

Learn

Seoul National University

=[1, 2, 5, 10, …]T

Representation of Data

9

• Each datum is one point in a data space

Data space

elements

Seoul National University

Classification

10

Seoul National University

Bayes Optimal Classifier• Our ultimate goal is not a zero error.

11

(Optimal) Bayes error

Figure credit: Masashi Sugiyama

Seoul National University

Model on Each Class

• Model: Class-conditional density as a Gaussian

12

Seoul National University

Optimal Regression• Minimizing mean square error

13

Minimize

Minimized when

Seoul National University

Model for Regression• Obtain regression function

from data

• Choose a model where the following expectation is minimized:

– Minimized for

• Bias-Variance tradeoff

14

Variance Bias^2

Seoul National University

Several Rules

15

Seoul National University

For More Than Two Random Variables• For three disjoint sets for a

random variable and another three disjoint sets for a random variable :

16

1

Seoul National University

Conditional Probability

17

1

Seoul National University

Conditional Probability Density

18

D

z = c

New domain

y = 1

y = 2

Seoul National University

Conditional Probability Density

19

Seoul National University

Marginal Probability Density

20

sum

Seoul National University

Marginal Probability Density and Conditional Probability Density in Machine Learning

21

Conditional PDF Conditional PDF

Joint PDF

2

1

Seoul National University

Marginal Probability Density and Conditional Probability Density in Machine Learning

22

Joint PDF

sum

Seoul National University

Benefits of Using High Dimensionalities• Feature 1 and Feature 2 have correlation

23

Feature 1

Feature 2

Seoul National University

Curse of Dimensionality– To achieve same density as N = 100 for 1-

variable– We need N = 100D for D variables

– Conversely, when we have 60,000 data for 10-dimensional space, the density is the same as 3 data in 1-dimensional space.

24

Seoul National University

GAUSSIAN DENSITY FUNCTION

25

Seoul National University

Gaussian Random Variable

26

Principal axes are the eigenvector directions of

p¸1p

¸2

: eigenvalues

Seoul National University

Gaussian Random Variable - Projection

27

Projection to any direction is Gaussian.

Seoul National University

Gaussian Random Variable – Marginal

28

Seoul National University

Gaussian Random Variable – Conditional

29

Seoul National University

PARAMETER ESTIMATION

30

Seoul National University

Motivation – Parameter Estimation• Parameter estimation is an optimization

problem

31

: estimated probability density function,in other words, density function that fits data the most

Seoul National University

Maximum Likelihood Estimation• Parameter estimation is an optimization

problem

32

Seoul National University

Maximum Likelihood for Gaussian

• With optimal parameters satisfying

33

Empirical mean and empirical covariance are the maximum likelihood solutions.

Seoul National University

Maximum Likelihood for Gaussian

34

rμ ln p(X jμ) = ~0 μ = ¹; §

Seoul National University

Maximum A Posteriori (MAP) Estimation

• MAP estimation

• Likelihood (Model): • Prior:• Bayes rule:

35

cf)

Seoul National University

Maximum A Posteriori (MAP) Estimation for Gaussian

• Let the prior

• The posterior can be calculated using

36

Seoul National University

Maximum A Posteriori (MAP) Estimation for Gaussian

37

Seoul National University

Maximum A Posteriori (MAP) Estimation for Gaussian• Posterior density

– Caution: Posterior of , not the density function of

• MAP of = Mean of =

38

Seoul National University

MLE vs. MAP• For Gaussian

– When N is just a few (say N = 5),

39

Dominant

Seoul National University

MLE vs. MAP• For Gaussian

– When we have a few outliers

40

Dominant (learn from )

Seoul National University

MLE vs. MAP

41

Seoul National University

Bayesian Integration• The final standard method of prediction is to

use Bayesian inference instead of estimating the parameter point.– Do not insert directly, but

marginalize.

42

Uncertainty of = N (¹n; ¾2 + ¾2n)

Seoul National University

Conjugate Priors• Given a likelihood pdf, , posterior

has the same form as the prior .

43

Prior PosteriorLikelihood

Two distributionsHave the same form

Seoul National University

Conjugate Priors

44

Gaussian GaussianGaussian

GammaGammaGaussian

Dirichlet DirichletMultinomial

Beta BetaBinomial

Gamma GammaPoisson

Seoul National University

Kullback-Leibler Divergence

45

: Empirical density function: Model density function

Seoul National University

Kullback-Leibler Divergence

46

Likelihood:

KL Divergence:

Seoul National University

Kullback-Leibler Divergence

47

Model with complex function will capture the noise.

Seoul National University

Consistency and Bayes Error• Objective: minimizing expected error

48

Bayes error

0

Error

Smaller Larger

Error for training data (train error)

(For example, a linear classifier with regularization)

: loss function: A realization of N data

Seoul National University

Consistency and Bayes Error• Consistent learner with many data

49

Bayes error

0

Error

Smaller Larger

train error

Minimum error of consistent learner

Consistency does not necessarily imply achieving the Bayes error!!!

(For example, a linear classifier with regularization)

Seoul National University

GAUSSIAN PROCESS REGRESSION

50

Seoul National University

Prediction Using Correlation

51

Seoul National University

Gaussian Process

52

Figure credit: Christopher Bishop

Seoul National University

Gaussian Process as an Infinite Dimensional Gaussian

53

Seoul National University

Gaussian Process Regression

54

Figure credit: C. E. Rasmussen & C. K. I. Williams

For given

y(x;D) = k>K¡1y

Seoul National University

Summary• What we did:

– Probability and probability density– Conditional density, marginalized density– Model construction– Gaussian model– Parameter estimation– Gaussian process

• What we didn’t do:– Multinomial distribution and Dirichlet distribution– Convergence of estimation– Generative model vs. Discriminative model

55

Seoul National University

THANK YOUYung-Kyun Nohnohyung@snu.ac.kr

56