PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML...

56
Seoul National University PRML을 위한 기초 확률 이론 (Basic Probability Theory for PRML) Yung-Kyun Noh (노영균) Seoul National University 패턴인식 및 기계학습 겨울학교 (Pattern Recognition and Machine Learning Winter School) 2016. 1. 20 (Wed.)

Transcript of PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML...

Page 1: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

PRML을위한기초확률이론(Basic Probability Theory for PRML)

Yung-Kyun Noh (노영균)Seoul National University

패턴인식및기계학습겨울학교(Pattern Recognition and Machine Learning Winter School)

2016. 1. 20 (Wed.)

Page 2: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Contents• Probability / Probability density

• Conditional probability (density)

• Marginal probability (density)• Joint probability (density)• Inference and classification• Gaussian Processes

2

Page 3: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

• Mapping from a random variable to a number

Probability

3

…0 1

Event 1 Event 2 Event 3 Event N…

Page 4: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Probability

4

if

: random variable : set of outputs of random variables

Page 5: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Probability and Probability Density

5

RD

Probability =

Page 6: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Probability and Probability Density

6

Event is defined infinitesimally:

R1

DR2

: set of infinitesimal events

Page 7: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Can you explain the meaning of these functions?

7

Compare with

Page 8: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Supervised Learning (Prediction)

8

• Method: – Learning from

examples and can classify an unseen data

label

classifierNew unseen data (a horse)

[Based on the assumption of regularity]

car car

planeship

horse

horse horsecar

Learn

Page 9: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

=[1, 2, 5, 10, …]T

Representation of Data

9

• Each datum is one point in a data space

Data space

elements

Page 10: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Classification

10

Page 11: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Bayes Optimal Classifier• Our ultimate goal is not a zero error.

11

(Optimal) Bayes error

Figure credit: Masashi Sugiyama

Page 12: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Model on Each Class

• Model: Class-conditional density as a Gaussian

12

Page 13: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Optimal Regression• Minimizing mean square error

13

Minimize

Minimized when

Page 14: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Model for Regression• Obtain regression function

from data

• Choose a model where the following expectation is minimized:

– Minimized for

• Bias-Variance tradeoff

14

Variance Bias^2

Page 15: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Several Rules

15

Page 16: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

For More Than Two Random Variables• For three disjoint sets for a

random variable and another three disjoint sets for a random variable :

16

1

Page 17: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Conditional Probability

17

1

Page 18: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Conditional Probability Density

18

D

z = c

New domain

y = 1

y = 2

Page 19: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Conditional Probability Density

19

Page 20: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Marginal Probability Density

20

sum

Page 21: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Marginal Probability Density and Conditional Probability Density in Machine Learning

21

Conditional PDF Conditional PDF

Joint PDF

2

1

Page 22: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Marginal Probability Density and Conditional Probability Density in Machine Learning

22

Joint PDF

sum

Page 23: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Benefits of Using High Dimensionalities• Feature 1 and Feature 2 have correlation

23

Feature 1

Feature 2

Page 24: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Curse of Dimensionality– To achieve same density as N = 100 for 1-

variable– We need N = 100D for D variables

– Conversely, when we have 60,000 data for 10-dimensional space, the density is the same as 3 data in 1-dimensional space.

24

Page 25: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

GAUSSIAN DENSITY FUNCTION

25

Page 26: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Gaussian Random Variable

26

Principal axes are the eigenvector directions of

p¸1p

¸2

: eigenvalues

Page 27: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Gaussian Random Variable - Projection

27

Projection to any direction is Gaussian.

Page 28: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Gaussian Random Variable – Marginal

28

Page 29: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Gaussian Random Variable – Conditional

29

Page 30: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

PARAMETER ESTIMATION

30

Page 31: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Motivation – Parameter Estimation• Parameter estimation is an optimization

problem

31

: estimated probability density function,in other words, density function that fits data the most

Page 32: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Maximum Likelihood Estimation• Parameter estimation is an optimization

problem

32

Page 33: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Maximum Likelihood for Gaussian

• With optimal parameters satisfying

33

Empirical mean and empirical covariance are the maximum likelihood solutions.

Page 34: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Maximum Likelihood for Gaussian

34

rμ ln p(X jμ) = ~0 μ = ¹; §

Page 35: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Maximum A Posteriori (MAP) Estimation

• MAP estimation

• Likelihood (Model): • Prior:• Bayes rule:

35

cf)

Page 36: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Maximum A Posteriori (MAP) Estimation for Gaussian

• Let the prior

• The posterior can be calculated using

36

Page 37: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Maximum A Posteriori (MAP) Estimation for Gaussian

37

Page 38: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Maximum A Posteriori (MAP) Estimation for Gaussian• Posterior density

– Caution: Posterior of , not the density function of

• MAP of = Mean of =

38

Page 39: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

MLE vs. MAP• For Gaussian

– When N is just a few (say N = 5),

39

Dominant

Page 40: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

MLE vs. MAP• For Gaussian

– When we have a few outliers

40

Dominant (learn from )

Page 41: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

MLE vs. MAP

41

Page 42: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Bayesian Integration• The final standard method of prediction is to

use Bayesian inference instead of estimating the parameter point.– Do not insert directly, but

marginalize.

42

Uncertainty of = N (¹n; ¾2 + ¾2n)

Page 43: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Conjugate Priors• Given a likelihood pdf, , posterior

has the same form as the prior .

43

Prior PosteriorLikelihood

Two distributionsHave the same form

Page 44: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Conjugate Priors

44

Gaussian GaussianGaussian

GammaGammaGaussian

Dirichlet DirichletMultinomial

Beta BetaBinomial

Gamma GammaPoisson

Page 45: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Kullback-Leibler Divergence

45

: Empirical density function: Model density function

Page 46: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Kullback-Leibler Divergence

46

Likelihood:

KL Divergence:

Page 47: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Kullback-Leibler Divergence

47

Model with complex function will capture the noise.

Page 48: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Consistency and Bayes Error• Objective: minimizing expected error

48

Bayes error

0

Error

Smaller Larger

Error for training data (train error)

(For example, a linear classifier with regularization)

: loss function: A realization of N data

Page 49: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Consistency and Bayes Error• Consistent learner with many data

49

Bayes error

0

Error

Smaller Larger

train error

Minimum error of consistent learner

Consistency does not necessarily imply achieving the Bayes error!!!

(For example, a linear classifier with regularization)

Page 50: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

GAUSSIAN PROCESS REGRESSION

50

Page 51: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Prediction Using Correlation

51

Page 52: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Gaussian Process

52

Figure credit: Christopher Bishop

Page 53: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Gaussian Process as an Infinite Dimensional Gaussian

53

Page 54: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Gaussian Process Regression

54

Figure credit: C. E. Rasmussen & C. K. I. Williams

For given

y(x;D) = k>K¡1y

Page 55: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

Summary• What we did:

– Probability and probability density– Conditional density, marginalized density– Model construction– Gaussian model– Parameter estimation– Gaussian process

• What we didn’t do:– Multinomial distribution and Dirichlet distribution– Convergence of estimation– Generative model vs. Discriminative model

55

Page 56: PRML을위한기초확률이론 (Basic Probability Theory for PRML) · PDF filePRML 을위한기초 ... 1. 20 (Wed.) Seoul National University Contents • Probability / Probability

Seoul National University

THANK YOUYung-Kyun [email protected]

56