engineering

Transcript

Page 1: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Machine Learning Overview

Sargur N. Srihari University at Buffalo, State University of New York

USA

Page 2: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Outline

1.  What is Machine Learning (ML)? 1.  As a scientific Discipline 2.  As an area of Computer Science/AI

2.  Core learning methods 1.  Supervised (Regression/Classification/Deep) 2.  Unsupervised (PCA, Clustering, Topic Models) 3.  Reinforcement

3.  Main drivers 1.  Mobile systems (big data) 2.  Personalization

Page 3: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Machine Learning as a Discipline •  Focused on two inter-related fundamental

scientific/engineering questions 1.  How can one construct computer systems that

automatically improve through experience? 2.  What are the statistical-computational-information-

theoretic laws that govern all learning systems •  Including computers, humans and organizations?

•  Machine learning is also important for highly practical computer software fielded across many applications

Page 4: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Machine Learning as Software Area •  Programming computers to:

– Perform tasks that humans perform well but difficult to specify algorithmically

•  Principled way of building high performance information processing systems – Probabilistic responses to queries—IR – Adaptive user interfaces, personalized

assistants (information systems) – Scientific/engineering applications

Page 5: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

ML within AI •  ML has emerged as method of choice for

practical software for: – Computer vision – Speech recognition – Natural language processing – Robot control – Other applications

•  Far easier to train by showing examples of input-output behavior – Than manually anticipate response for every input

Page 6: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Example Problem: Handwritten Digit Recognition

•  Handcrafted rules will result in large no of rules and exceptions

•  Better to have a machine that learns from a large training set

•  Handwriting recognition cannot be done without machine learning!

Wide variability of same numeral

Page 7: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Most Successful Application of ML

•  Learning to recognize spoken words – Speaker-specific strategies for recognizing

primitive sounds (phonemes) and words from speech signal

– Neural networks and methods for learning HMMs for customizing to individual speakers, vocabularies and microphone characteristics

– Recently Google increased accuracy for Android by 25% Table 1.1

Page 8: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

ML Example: Self-Driving Vehicle

•  Learning to drive an autonomous vehicle –  Train computer-controlled

vehicles to steer correctly –  Associate steering commands

with image sequences

Google Prototype

Deployment: Taxi Courier Service

ALVINN: Drive at 70mph for 90 miles on public highways

Tesla Autopilot

Page 9: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Drivers of ML Progress •  Mobile systems gather/transport vast

amounts of data: “Big data” – Turn to ML solutions to obtain insights,

predictions, decisions – Granularized personalized data

•  Personalization: relevance of posts shown –  Advertising copywriting

•  Historical medical records: Determine treatment •  Historical traffic data: Control congestion

Page 10: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Learning Problem Definition •  Improving some measure of performance P

when executing some task T through some type of training experience E

•  Example: Learning to detect credit card fraud

•  Task T–  Assign label of fraud or not fraud to credit card

transaction •  Performance measure P

–  Accuracy of fraud classifier With higher penalty when fraud is labeled as not fraud

•  Training experience E–  Historical credit card transactions labeled as fraud or not

Page 11: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

The ML Approach

Generalization

(Training)

Data Collection Samples

Model Selection Probability distribution to model process

Parameter Estimation Values/distributions

Inference Find responses to queries

Decision (Inference

OR Testing)

Page 12: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

ML History within AI •  ML/PR Methods around for over 50 years

Page 13: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Core Methods

1.  Supervised Learning – Training data consists of (x,y) pairs – Goal is prediction y* for input x*

2.  Unsupervised Learning – Analysis of unlabeled data

3.  Reinforcement Learning – Training data inbetween supervised/unsupervised

•  Indication of whether action is correct or not •  Rewad signal may refer to an entire input sequence 13

Page 14: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Supervised Learning •  Most widely used methods of ML, e.g.,

•  Spam classification of email •  Face recognizers over images •  Medical diagnosis systems

•  Inputs x are vectors or more complex objects – documents, DNA sequences or graphs

•  Outputs are binary, multiclass(K), – Multi-label (more than one class), ranking, – Structured:

•  y is a graph satisfying constraints, e.g., POS tagging – Real-valued or mixture of discrete and real-valued 14

Page 15: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Supervised Classification Example •  Off-shore oil transfer pipelines

•  Non-invasive measurement of proportion of oil,water, gas •  Called Three-phase Oil/Water/Gas Flow

•  Input data: Dual-energy gamma densitometry •  Beam of gamma rays passed through pipe •  Attenuation in intensity indicates density of material •  Single beam insufficient

•  Two degrees of freedom: fraction of oil, fraction of water •  One beam of Gamma rays of two energies (frequencies)

Detector • Six Beams • 12 measurements

• attenuation

Page 16: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Prediction Problems 1.  Predict Volume Fractions of oil/water/gas 2.  Predict configuration (one of three)

–  Twelve Features –  Three classes –  Two variables, 100 points shown

Which class should x belong to?

•  Naïve cell based voting fails –  exponential growth of cells with

dimensionality –  12 dimensions discretized into 6

gives 3 million cells •  Hardly any points in each cell

Page 17: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Probability Theory •  Sum Rule for Marginalization •  Product Rule: for combining •  Bayes Rule

•  Fully Bayesian approach

•  Conjugate distributions •  Feasible with increased computational power •  Intractable posterior handled using either

–  Variational Bayes or –  Stochastic sampling –  e.g., Markov Chain Monte Carlo, Gibbs

p(X,Y ) =

nij

N= p(Y | X)p(X)

p(X = x

i) = p(X = x

i,Y = y

j=1

∑

)()()|()|(

XpYpYXpXYp = ∑=

YYpYXpXp )()|()(where

Viewed as Posterior α likelihood x prior

Page 18: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Probability Distributions Discrete- Binary

Discrete- Multi-valued

Continuous

Bernoulli Single binary variable

Multinomial One of K values = K-dimensional binary vector

Gaussian

Angular Von Mises

Binomial N samples of Bernoulli

Beta Continuous variable between {0,1]

Dirichlet K random variables between [0.1]

Gamma ConjugatePrior of univariate Gaussian precision

Wishart Conjugate Prior of multivariate Gaussian precision matrix

Student’s-t Generalization of Gaussian robust to Outliers Infinite mixture of Gaussians

Exponential Special case of Gamma

Uniform

N=1 Conjugate Prior

Conjugate Prior

Large N

K=2

Gaussian-Gamma Conjugate prior of univariate Gaussian Unknown mean and precision

Gaussian-Wishart Conjugate prior of multi-variate Gaussian Unknown mean and precision matrix

Page 19: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Statistical Models •  Generative

–  Naïve Bayes –  Mixtures of

multinomials –  Mixtures of Gaussians –  Hidden Markov Models

(HMM) –  Bayesian networks –  Markov random fields

•  Discriminative –  Logistic regression –  SVMs –  Traditional neural

networks –  Nearest neighbor –  Conditional Random

Fields (CRF)

Page 20: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

HMMs for Speech Recognition •  Three distinct layers 1.  Language Model:

– generates sentences as sequences of words

2.  Word Model: – described as a sequence of

phonemes /p//u//sh/ 3.  Acoustic model:

– shows progression of the acoustic signal through a phoneme 20

Page 21: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

DBN for monitoring a vehicle •  Represents system dynamics •  X5: Observation depends on car’s

location (and map not modeled) and error status of sensor (failure) (X4)

•  X1: Bad weather makes sensor likely to fail (X4)

•  X3: Location depends on previous position and velocity (X2)

Obs0

Weather0

Velocity0

Location0

Failure0

Obs0

Weather0

Velocity0

Location0

Failure0

Obs1

Weather1

Velocity1

Location1

Failure1

Obs2

Weather2

Velocity2

Location2

Failure2

Obs'

Weather Weather'

Velocity Velocity'

Location Location'

Failure Failure'

Time slice t Time slice t +1 Time slice 0 Time slice 0 Time slice 1 Time slice 2

Page 22: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Regression

Problem data set

Red curve is result of fitting a two-layer neural network by minimizing squared error

Corresponding inverse problem by reversing x and t

Very poor fit to data: GMMs used here

Page 23: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Regression: Learning to Rank

–  Log frequency of query in anchor text –  Query word in color on page –  # of images on page –  # of (out) links on page –  PageRank of page –  URL length –  URL contains “~” –  Page length

Input (xi): (d Features of Query-URL pair)

Output (y): Relevance Value

In LETOR 4.0 dataset 46 query-document features Maximum of 124 URLs/query

(d >200)

Target Variable

-  Point-wise (0,1,2,3) -  Regression returns continuous value

- Allows fine-grained ranking of URLs

Traditional IR uses TF/IDF

Page 24: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Deep Learning•  Multilayer stack of simple modules subject to:

– Learning, Non-linear map (ReLU)•  5 to 20 layers

–  Sensitive to minute details (Samoyeds from white wolves)–  Invariant (Background, pose, lighting, other objects)

•  Convolutional Nets–  alternate convolutional layer and pooling layer

•  Stunning success– ConvNet +Recurrent Net

1.  Representation by CNN2.  RNN trained to translate

Page 25: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Unsupervised Learning •  Labeled data under assumption of underlying

structure of data, e.g., 1.  Clustering is to find partition of data 2.  Identify a low-dimensional manifold

•  PCA, manifold learning, factor analysis, random projections, auto-encoders

•  Topic modeling, Recommendation systems

•  A criterion function is used e.g., max likelihood •  Computational complexity is key

–  to exploit large unlabeled data sets 25

Page 26: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Clustering •  Finding a partition for observed data

–  And a rule for predicting future data

–  Old Faithful Geyser in Yellowstone •  Simple Gaussian unable to capture structure •  Linear superposition of two Gaussians is better

–  Gaussian cannot model such data sets •  Gaussian Mixture Models give very complex

densities

pk are mixing coefficients that sum to one •  Log-likelihood function is

•  There is no closed-form solution Use either iterative numerical optimization techniques or

Expectation Maximization

∑=

Σ=K

kkkk xNp

),|()x( µπ

One –dimension Three Gaussians in blue Sum in red

( )∑ ∑= = ⎭

⎬⎫

⎩⎨⎧ Σ=Σ

kkknk NXp

1 1

,|xln),,|(ln µπµπ

272 observations Duration (mins, horiz axis) vs Time to next eruption (vertical)

Page 27: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Topic Models •  Unsupervised methods to analyze documents

– Topics are distributions over words – A document is a distribution across topics – Methods: SVD, Collaborative Filtering

The ability to learn from data with uncertain and missing information is a fundamental requirement for learning systems. In the "real world" , features are missing due to unrecorded information or due to occlusion in vision, and measurements are affected by noise. In some cases the experimenter might want to assign varying degrees of reliability to the data. In regression, uncertainty is typically attributed to the dependent variable which is assumed to be disturbed by additive noise. But there is no reason to assume that input features might not be uncertain as well or even missing completely. In some cases, we can ignore the problem: instead of trying to model the relationship between the true input and the output we are satisfied with modeling the relationship between the uncertain input and the output. But there are at least two reasons why we might want to explicitly deal with uncertain inputs. First, we might be interested in the underlying relationship between the true input and the output (e.g. the relationship has some physical meaning). Second, the problem might be non-stationary in the sense that for different samples different inputs are uncertain or missing or the levels of uncertainty vary. The naive strategy of training networks for all possible input combinations explodes in complexity and would require sufficient data for all relevant cases. It makes more sense to define one underlying true model and relate all data to this one model. Ahmad and Tresp (1993) have shown how to include uncertainty during recall under the assumption that the network approximates the "true" underlying function. In this paper, we first show how input uncertainty can be taken into account in the training of a feedforward neural network . Then we show that for networks of Gaussian basis functions it is possible to obtain closed-form solutions. We validate the solutions on two applications.

Topic 1training 0.08network 0.05

neural 0.03…………

Topic 2noise 0.017

uncertain 0.011reliability 0.010positive 0.0084

…………..

Topic 3data 0.1041

estimate 0.020estimation 0.019

…………….

Topic 1 Topic 2

Topic 3

An Example of Topic Modeling Topics

Topic Distribution

Page 28: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Recommendation Systems

•  Data indicates links between users and items

•  Suggest other items to a user based on data across all users

•  Solution: SVD, Collaborative Filtering

Page 29: Machine Learning Overviesrihari/CSE574/Chap1/ML-Overview.pdf · 2015-10-17 · Machine Learning as a Discipline • Focused on two inter-related fundamental scientific/engineering

Reinforcement Learning

•  Dog is given a reward/punishment for an action –  Policies: what actions to take in a particular situation –  Utility estimation: how good is state (àused by policy)

•  No supervised output but delayed reward •  Credit assignment

– what was responsible for outcome •  Applications:

–  Game playing –  Robot in a maze –  Multiple agents, partial observability, … 29