1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University...
-
Upload
linda-jennifer-tyler -
Category
Documents
-
view
213 -
download
0
Transcript of 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University...
![Page 1: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/1.jpg)
1
Machine Learning Overview
Adapted from Sargur N. SrihariUniversity at Buffalo, State University of New YorkUSA
![Page 2: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/2.jpg)
Outline1.What is Machine Learning (ML)?2. Types of Information Processing
Problems Solved1.2.3.4.
RegressionClassificationClusteringModeling Uncertainty/Inference
3. New Developments1.Fully Bayesian Approach
4.Summary
2
![Page 3: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/3.jpg)
3
What is Machine Learning?
• Programming computers to:–Perform tasks that humans perform well but
difficult to specify algorithmically
• Principled way of building high performanceinformation processing systems–search engines, information retrieval
–adaptive user interfaces, personalizedassistants (information systems)
–scientific application (computational science)
–engineering
![Page 4: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/4.jpg)
4
Example Problem:Handwritten Digit Recognition
• Handcrafted ruleswill result in large noof rules andexceptions
• Better to have amachine that learnsfrom a large trainingset
Wide variability of same numeral
![Page 5: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/5.jpg)
5
ML History
•
•
•
•
•
ML has origins in Computer Science
PR has origins in Engineering
They are different facets of the same field
Methods around for over 50 years
Revival of Bayesian methods–due to availability of computational methods
![Page 6: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/6.jpg)
6
Some Successful Applications ofMachine Learning
• Learning to recognize spoken words–Speaker-specific strategies for recognizing
primitive sounds (phonemes) and wordsfrom speech signal
–Neural networks and methods for learningHMMs for customizing to individual
speakers, vocabularies and microphonecharacteristics
Table 1.1
![Page 7: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/7.jpg)
7
Some Successful Applications ofMachine Learning
• Learning to drive anautonomous vehicle– Train computer-controlled
vehicles to steer correctly
– Drive at 70 mph for 90miles on public highways
– Associate steeringcommands with imagesequences
![Page 8: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/8.jpg)
8
Handwriting Recognition
• Task T– recognizing and classifying
handwritten words withinimages
• Performance measure P– percent of words correctly
classified
• Training experience E– a database of handwritten
words with givenclassifications
![Page 9: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/9.jpg)
Generalization
(Training)
The ML ApproachData Collection
SamplesModel Selection
Probability distribution to model processParameter Estimation
SearchFind optimal solution to problem
Values/distributions
Decision(Inference
ORTesting)
9
![Page 10: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/10.jpg)
Types of Problems machinelearning is used for
10
1. Classification
2. Regression
3. Clustering (Data Mining)
4. Modeling/Inference
![Page 11: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/11.jpg)
16
Classification
• Three classes (Stratified, Annular, Homogeneos)
• Two variables shown• 100 points
Which classshould xbelong to?
![Page 12: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/12.jpg)
17
Cell-based Classification
• Naïve approach of cellbased voting will fail– exponential growth of cells
with dimensionality– 12 dimensions discretized
into 6 gives 3 million cells
• Hardly any points in eachcell
![Page 13: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/13.jpg)
Popular Statistical Models• Generative
– Naïve Bayes
– Mixtures ofmultinomials
– Mixtures of Gaussians
– Hidden Markov Models(HMM)
– Bayesian networks
– Markov random fields
18
• Discriminative– Logistic regression
– SVMs
– Traditional neuralnetworks
– Nearest neighbor
– Conditional RandomFields (CRF)
![Page 14: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/14.jpg)
Regression ProblemsForward problemdata set
Red curve is result offitting a two-layerneural networkby minimizingsum-of-squared
19 error
Corresponding inverseproblem by reversingx and t
Very poor fitto data:GMMs used here
![Page 15: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/15.jpg)
πkare mixing coefficients that sum to one
21
Clustering•
•
•
•
Old Faithful (Hydrothermal Geyser inYellowstone)
– 272 observations– Duration (mins, horiz axis) vs Time to next eruption (vertical axis)– Simple Gaussian unable to capture structure– Linear superposition of two Gaussians is better
Gaussian has limitations in modeling realdata setsGaussian Mixture Models give verycomplex densities
One –dimension– Three Gaussians in blue– Sum in red
![Page 16: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/16.jpg)
Bayesian Representation of Uncertainty
• Use of probabilityto representuncertainty is notan ad-hoc choice
• If numerical valuesrepresent degreesof belief,– then simple axioms
for manipulatingdegrees of beliefleads to sum andproduct rules ofprobability
23
• Not just frequency of random,repeatable event
• It is a quantification ofuncertainty
• Example: Whether Arctic ice capwill disappear by end of century– We have some idea of how
quickly polar ice is melting– Revise it on the basis of fresh
evidence (satellite observations)– Assessment will affect actions
we take (to reduce greenhousegases)
• Handled by general Bayesianinterpretation
![Page 17: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/17.jpg)
Modeling Uncertainty
24
• A Causal Bayesian Network• Example of Inference:
Cancer is independentof Age and Gender givenexposure to Toxicsand Smoking
![Page 18: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/18.jpg)
25
The Fully Bayesian Approach
•
•
•
•
Bayes Rule
Bayesian Probabilities
Concept of Conjugacy
Monte Carlo Sampling
![Page 19: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/19.jpg)
26
Rules of Probability• Given random variables X and Y• Sum Rule gives Marginal Probability
• Product Rule: joint probability in terms of conditional andmarginal
• Combining we get Bayes Rule
where
Viewed asPosterior likelihood prior
![Page 20: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/20.jpg)
27
Probability Distributions: RelationshipsDiscrete-Binary
BernoulliSingle binary variable
Discrete-Multi-valued
Continuous
Gaussian
Angular Von Mises
BinomialN samples of Bernoulli
BetaContinuous variable
MultinomialOne of K values =K-dimensionalbinary vector
GammaConjugatePrior of univariateGaussian precision
Infinite mixture ofGaussians
DirichletK random variablesbetween [0.1]
ExponentialSpecial case of Gamma
Uniform
N=1ConjugatePrior
ConjugatePrior
WishartConjugate Prior of multivariateGaussian precision matrix
LargeN
Student’s-tGeneralization ofGaussian robust toOutliers
between {0,1]
K=2
Gaussian-GammaConjugate prior of univariate GaussianUnknown mean and precision
Gaussian-WishartConjugate prior of multi-variate GaussianUnknown mean and precision matrix
![Page 21: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/21.jpg)
28
Fully Bayesian Approach
• Feasible with increased computationalpower
• Intractable posterior distribution handledusing either–variational Bayes or
–stochastic sampling• e.g., Markov Chain Monte Carlo, Gibbs
![Page 22: 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.](https://reader035.fdocument.pub/reader035/viewer/2022070406/56649e025503460f94aece82/html5/thumbnails/22.jpg)
29
Summary• ML is a systematic approach instead of ad-hockery
in designing information processing systems• Useful for solving problems of classification,
regression, clustering and modeling• The fully Bayesian approach together with Monte
Carlo sampling is leading to high performingsystems
• Great practical value in many applications– Computer science
• search engines• text processing• document recognition
– Engineering