1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University...

1

Machine Learning Overview

Adapted from Sargur N. SrihariUniversity at Buffalo, State University of New YorkUSA

Outline1.What is Machine Learning (ML)?2. Types of Information Processing

Problems Solved1.2.3.4.

RegressionClassificationClusteringModeling Uncertainty/Inference

3. New Developments1.Fully Bayesian Approach

4.Summary

2

3

What is Machine Learning?

• Programming computers to:–Perform tasks that humans perform well but

difficult to specify algorithmically

• Principled way of building high performanceinformation processing systems–search engines, information retrieval

–adaptive user interfaces, personalizedassistants (information systems)

–scientific application (computational science)

–engineering

4

Example Problem:Handwritten Digit Recognition

• Handcrafted ruleswill result in large noof rules andexceptions

• Better to have amachine that learnsfrom a large trainingset

Wide variability of same numeral

5

ML History

•

•

•

•

•

ML has origins in Computer Science

PR has origins in Engineering

They are different facets of the same field

Methods around for over 50 years

Revival of Bayesian methods–due to availability of computational methods

6

Some Successful Applications ofMachine Learning

• Learning to recognize spoken words–Speaker-specific strategies for recognizing

primitive sounds (phonemes) and wordsfrom speech signal

–Neural networks and methods for learningHMMs for customizing to individual

speakers, vocabularies and microphonecharacteristics

Table 1.1

7

Some Successful Applications ofMachine Learning

• Learning to drive anautonomous vehicle– Train computer-controlled

vehicles to steer correctly

– Drive at 70 mph for 90miles on public highways

– Associate steeringcommands with imagesequences

8

Handwriting Recognition

• Task T– recognizing and classifying

handwritten words withinimages

• Performance measure P– percent of words correctly

classified

• Training experience E– a database of handwritten

words with givenclassifications

Generalization

(Training)

The ML ApproachData Collection

SamplesModel Selection

Probability distribution to model processParameter Estimation

SearchFind optimal solution to problem

Values/distributions

Decision(Inference

ORTesting)

9

Types of Problems machinelearning is used for

10

1. Classification

2. Regression

3. Clustering (Data Mining)

4. Modeling/Inference

16

Classification

• Three classes (Stratified, Annular, Homogeneos)

• Two variables shown• 100 points

Which classshould xbelong to?

17

Cell-based Classification

• Naïve approach of cellbased voting will fail– exponential growth of cells

with dimensionality– 12 dimensions discretized

into 6 gives 3 million cells

• Hardly any points in eachcell

Popular Statistical Models• Generative

– Naïve Bayes

– Mixtures ofmultinomials

– Mixtures of Gaussians

– Hidden Markov Models(HMM)

– Bayesian networks

– Markov random fields

18

• Discriminative– Logistic regression

– SVMs

– Traditional neuralnetworks

– Nearest neighbor

– Conditional RandomFields (CRF)

Regression ProblemsForward problemdata set

Red curve is result offitting a two-layerneural networkby minimizingsum-of-squared

19 error

Corresponding inverseproblem by reversingx and t

Very poor fitto data:GMMs used here

πkare mixing coefficients that sum to one

21

Clustering•

•

•

•

Old Faithful (Hydrothermal Geyser inYellowstone)

– 272 observations– Duration (mins, horiz axis) vs Time to next eruption (vertical axis)– Simple Gaussian unable to capture structure– Linear superposition of two Gaussians is better

Gaussian has limitations in modeling realdata setsGaussian Mixture Models give verycomplex densities

One –dimension– Three Gaussians in blue– Sum in red

Bayesian Representation of Uncertainty

• Use of probabilityto representuncertainty is notan ad-hoc choice

• If numerical valuesrepresent degreesof belief,– then simple axioms

for manipulatingdegrees of beliefleads to sum andproduct rules ofprobability

23

• Not just frequency of random,repeatable event

• It is a quantification ofuncertainty

• Example: Whether Arctic ice capwill disappear by end of century– We have some idea of how

quickly polar ice is melting– Revise it on the basis of fresh

evidence (satellite observations)– Assessment will affect actions

we take (to reduce greenhousegases)

• Handled by general Bayesianinterpretation

Modeling Uncertainty

24

• A Causal Bayesian Network• Example of Inference:

Cancer is independentof Age and Gender givenexposure to Toxicsand Smoking

25

The Fully Bayesian Approach

•

•

•

•

Bayes Rule

Bayesian Probabilities

Concept of Conjugacy

Monte Carlo Sampling

26

Rules of Probability• Given random variables X and Y• Sum Rule gives Marginal Probability

• Product Rule: joint probability in terms of conditional andmarginal

• Combining we get Bayes Rule

where

Viewed asPosterior likelihood prior

27

Probability Distributions: RelationshipsDiscrete-Binary

BernoulliSingle binary variable

Discrete-Multi-valued

Continuous

Gaussian

Angular Von Mises

BinomialN samples of Bernoulli

BetaContinuous variable

MultinomialOne of K values =K-dimensionalbinary vector

GammaConjugatePrior of univariateGaussian precision

Infinite mixture ofGaussians

DirichletK random variablesbetween [0.1]

ExponentialSpecial case of Gamma

Uniform

N=1ConjugatePrior

ConjugatePrior

WishartConjugate Prior of multivariateGaussian precision matrix

LargeN

Student’s-tGeneralization ofGaussian robust toOutliers

between {0,1]

K=2

Gaussian-GammaConjugate prior of univariate GaussianUnknown mean and precision

Gaussian-WishartConjugate prior of multi-variate GaussianUnknown mean and precision matrix

28

Fully Bayesian Approach

• Feasible with increased computationalpower

• Intractable posterior distribution handledusing either–variational Bayes or

–stochastic sampling• e.g., Markov Chain Monte Carlo, Gibbs

29

Summary• ML is a systematic approach instead of ad-hockery

in designing information processing systems• Useful for solving problems of classification,

regression, clustering and modeling• The fully Bayesian approach together with Monte

Carlo sampling is leading to high performingsystems

• Great practical value in many applications– Computer science

• search engines• text processing• document recognition

– Engineering

1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University...

Documents

Transcript of 1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University...