PRML Chap.1
-
Upload
liang-zeng-han -
Category
Documents
-
view
224 -
download
1
description
Transcript of PRML Chap.1
-
.......
PRML Chapter 1 : Introduction
Liang Zeng Han
2013/3/19(Tue)
2013/3/19(Tue)
-
Introduction
About PRML
Pattern Recognition and Machine Learning, Christopher. M. Bishop,2006, Springer-Verlag New York
Support Page :http://research.microsoft.com/en-us/um/people/cmbishop/PRML/
Wikihttp://ibisforest.org/index.php?PRML
: https://github.com/herumi/prml
2013/3/19(Tue)
-
Introduction
What is expected to Machine Learning
(ex. face detection) (ex. collaborative ltering)
2013/3/19(Tue)
-
Example
Handwritten Digit Recognition
-
Introduction
Preprocessing
(feature extraction) : (,PCA) :
2013/3/19(Tue)
-
Introduction
Classication of Learning
...1 Supervised learning
classication : regression : ()
...2 Unsupervised learning
clustering density estimation visualiation
...3 Reinforcement learning
2013/3/19(Tue)
-
Polynomial Curve Fitting
Polynomial Curve Fitting
2013/3/19(Tue)
-
Polynomial Curve Fitting
-
Sum-of-Squares Error Function
-
0th Order Polynomial
-
1st Order Polynomial
-
3rd Order Polynomial
-
9th Order Polynomial
-
Over-fitting
Root-Mean-Square (RMS) Error:
-
Polynomial Curve Fitting
sin(2x) Taylor :
sin(2x) =1Xn=1
(1)n+1 (2)2n1
(2n 1)!x2n1
= 2x (2)3
3!x3 +
(2)5
5!x5 (2)
7
7!x7
' 6:28x 41:34x3 + 81:61x5 76:71x7
M !1
w2n1 ! (1)n+1 (2)2n1
(2n 1)! ; w2n ! 0
?
2013/3/19(Tue)
-
Polynomial Coefficients
-
Data Set Size:
9th Order Polynomial
-
Data Set Size:
9th Order Polynomial
-
Regularization
Penalize large coefficient values
-
Regularization:
-
Regularization:
-
Regularization: vs.
-
Polynomial Coefficients
-
Probability Theory
Probability Theory
2013/3/19(Tue)
-
Probability Theory
Apples and Oranges
-
Probability Theory
Marginal Probability
Conditional ProbabilityJoint Probability
-
Probability Theory
Sum Rule
Product Rule
-
The Rules of Probability
Sum Rule
Product Rule
Sum Rule
Product Rule
-
Bayes Theorem
posterior v likelihood prior
-
Probability Densities
-
Transformed Densities
-
Expectations
Conditional Expectation(discrete)
Approximate Expectation(discrete and continuous)
-
Variances and Covariances
-
Probability Theory
Rethink : Bayes Theorem
prior probability : p(w)D likelihood function : p(Djw)w D posterior probability : p(wj D)D
p(wj D) = p(Djw)p(w)p(D)
p(D) =Z
p(Djw)p(w)dw
2013/3/19(Tue)
-
The Gaussian Distribution
-
Gaussian Mean and Variance
-
The Multivariate Gaussian
-
Gaussian Parameter Estimation
Likelihood function
-
Maximum (Log) Likelihood
-
Properties of and
-
Curve Fitting Re-visited
-
Maximum Likelihood
Determine by minimizing sum-of-squares error, .
-
Predictive Distribution
-
MAP: A Step towards Bayes
Determine by minimizing regularized sum-of-squares error, .
-
Bayesian Curve Fitting
-
Bayesian Predictive Distribution
-
Model Selection
Model Selection
2013/3/19(Tue)
-
Model Selection
Cross-Validation
-
Curse of Dimensionality
Curse of Dimensionality
2013/3/19(Tue)
-
Curse of Dimensionality
-
Curse of Dimensionality
Polynomial curve fitting, M = 3
Gaussian Densities in higher dimensions
-
Decision Theory
Decision Theory
2013/3/19(Tue)
-
Decision Theory
Inference step
Determine either or .
Decision stepFor given x, determine optimal t.
-
Minimum Misclassification Rate
-
Minimum Expected Loss
Example: classify medical images as cancer or normal
DecisionT
r
u
t
h
-
Minimum Expected Loss
Regions are chosen to minimize
-
Reject Option
-
Why Separate Inference and Decision?
Minimizing risk (loss matrix may change over time) Reject option Unbalanced class priors Combining models
-
Decision Theory for Regression
Inference step
Determine .
Decision stepFor given x, make optimal prediction, y(x), for t.
Loss function:
-
The Squared Loss Function
-
Generative vs Discriminative
Generative approach:
Model
Use Bayes theorem
Discriminative approach:
Model directly
-
Information Theory
Information Theory
2013/3/19(Tue)
-
Entropy
Important quantity in coding theory statistical physicsmachine learning
-
Entropy
Coding theory: x discrete with 8 possible states; how many bits to transmit the state of x?
All states equally likely
-
Entropy
-
Entropy
In how many ways can N identical objects be allocated Mbins?
Entropy maximized when
-
Entropy
-
Differential Entropy
Put bins of width along the real line
Differential entropy maximized (for fixed ) when
in which case
-
Conditional Entropy
-
The Kullback-Leibler Divergence
-
Mutual Information
prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory
prml-slides-1_2prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory
prml-slides-1_3-9prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory
prml-slides-1_10-17prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory
prml-slides-1_18-26prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory
prml-slides-1_27-38prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory
prml-slides-1_39prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory
prml-slides-1_40-41prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory
prml-slides-1_42-50prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory
prml-slides-1_51-59