PATTERNAN RECOGNITIONCOGNI ION AND …feihu.eng.ua.edu/NSF_TUES/w8_1and2.pdfPATTERNAN...

31
5/17/2012 1 A N COGNI ION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Example Handwritten Digit Recognition

Transcript of PATTERNAN RECOGNITIONCOGNI ION AND …feihu.eng.ua.edu/NSF_TUES/w8_1and2.pdfPATTERNAN...

5/17/2012

1

A N COGNI IONPATTERN RECOGNITION AND MACHINE LEARNINGCHAPTER 1: INTRODUCTION

Example

Handwritten Digit Recognition

5/17/2012

2

Polynomial Curve Fitting

Sum‐of‐Squares Error Function

5/17/2012

3

0th Order Polynomial

1st Order Polynomial

5/17/2012

4

3rd Order Polynomial

9th Order Polynomial

5/17/2012

5

Over‐fitting

Root‐Mean‐Square (RMS) Error:

Polynomial Coefficients   

5/17/2012

6

Data Set Size: 

9th Order Polynomial

Data Set Size: 

9th Order Polynomial

5/17/2012

7

Regularization

Penalize large coefficient values

Regularization: 

5/17/2012

8

Regularization: 

Regularization:           vs. 

5/17/2012

9

Polynomial Coefficients   

Probability Theory

Apples and Oranges

5/17/2012

10

Probability Theory

Marginal ProbabilityMarginal Probability

Conditional ProbabilityJoint Probability yy

Probability Theory

Sum RuleSum Rule

Product Rule

5/17/2012

11

The Rules of Probability

Sum Rule

Product Rule

Bayes’ Theorem

posterior likelihood × priorposterior  likelihood × prior

5/17/2012

12

Probability Densities

Transformed Densities

JFMS5

Slide 24

JFMS5 This figure was taken from Solution 1.4 in the web-edition of the solutions manual for PRML, available at http://research.microsoft.com/~cmbishop/PRML. A more thorough explanation of what the figure shows is provided in the text of the solution.Markus Svensén, 11/14/2007

5/17/2012

13

Expectations

Conditional Expectation(discrete)

i iApproximate Expectation(discrete and continuous)

Variances and Covariances

5/17/2012

14

The Gaussian Distribution

Gaussian Mean and Variance

5/17/2012

15

The Multivariate Gaussian

Gaussian Parameter Estimation

Likelihood functionLikelihood function

5/17/2012

16

Maximum (Log) Likelihood

Properties of          and 

5/17/2012

17

Curve Fitting Re‐visited

Maximum Likelihood

Determine b minimi in s m of sq ares errorDetermine            by minimizing sum‐of‐squares error,             .

5/17/2012

18

Predictive Distribution

MAP: A Step towards Bayes

Determine               by minimizing regularized sum‐of‐squares error,             .

5/17/2012

19

Bayesian Curve Fitting

Bayesian Predictive Distribution

5/17/2012

20

Model Selection

Cross‐Validation

Curse of Dimensionality

5/17/2012

21

Curse of Dimensionality

Polynomial curve fitting, M = 3

Gaussian Densities in higher dimensions

Decision Theory

Inference step

Determine either orDetermine either            or           .

Decision step

For given x, determine optimal t.

5/17/2012

22

Minimum Misclassification Rate

Minimum Expected Loss

Example: classify medical images as ‘cancer’ or ‘normal’

Decision

Truth

5/17/2012

23

Minimum Expected Loss

Regions       are chosen to minimize

Reject Option

5/17/2012

24

Why Separate Inference and Decision?

• Minimizing risk (loss matrix may change over time)

• Reject option• Reject option

• Unbalanced class priors

• Combining models

Decision Theory for Regression

Inference step

DetermineDetermine            .

Decision step

For given x, make optimal prediction, y(x), for t.

Loss function:

5/17/2012

25

The Squared Loss Function

Generative vs Discriminative

Generative approach: 

ModelModel

Use Bayes’ theorem

Discriminative approach: 

Model           directly

5/17/2012

26

Entropy

Important quantity in• coding theory• statistical physics

hi l i• machine learning

Entropy

Coding theory: xdiscrete with 8 possible states; how many bits to transmit the state of x?bits to transmit the state of x?

All states equally likely

5/17/2012

27

Entropy

Entropy

In how many ways can N identical objects be allocated Mbins?bins?

Entropy maximized when

5/17/2012

28

Entropy

Differential Entropy

Put bins of width ∆ along the real line

Differential entropy maximized (for fixed     ) when

in which case

5/17/2012

29

Conditional Entropy

The Kullback‐Leibler Divergence

5/17/2012

30

Mutual Information