12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

27
12 March 1999 Dip HI KBS Module 1 Machine Learning Lucy Hederman
  • date post

    20-Jan-2016
  • Category

    Documents

  • view

    226
  • download

    0

Transcript of 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

Page 1: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 1

Machine Learning

Lucy Hederman

Page 2: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 2

KBS Development

Representation

ProblemAnalysis

ReasoningSystem ?

Solution

RealWorldProblem

Stage 1: analysis of the problem that produces a representation of

the problem that can be manipulated by the reasoning

system - this representation is often

a set of attribute values.

Stage 2: developing the reasoning mechanism that manipulates the problem representation to produce a solution.

Page 3: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 3

Stage 2

• Knowledge engineering - manually– rule development for a rule-based ES

• Learning - similarity-based– generalise from examples (training data)

• Learning - explanation-based– build on prior knowledge– use small number of canonical examples– incorporate explanations, analogy, ...

Page 4: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 4

Risk Assessment Example

• Expert might develop rules like– if collateral is adequate and

credit history is good then risk is low.

• Alternatively build a system which learns from existing data on loan application decisions (see attached).– Similarity-based learning

Page 5: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 5

Classifying apples and pearsGreeness Height Width Taste Weight Height/Width Class

No. 1 210 60 62 Sweet 186 0.97 AppleNo. 2 220 70 51 Sweet 180 1.37 PearNo. 3 215 55 55 Tart 152 1.00 AppleNo. 4 180 76 40 Sweet 152 1.90 PearNo. 5 220 68 45 Sweet 153 1.51 PearNo. 6 160 65 68 Sour 221 0.96 AppleNo. 7 215 63 45 Sweet 140 1.40 PearNo. 8 180 55 56 Sweet 154 0.98 AppleNo. 9 220 68 65 Tart 221 1.05 Apple

No. 10 190 60 58 Sour 174 1.03 Apple

No. x 222 70 55 Sweet 185 1.27 ?

To what class does this belong?

Page 6: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 6

Supervised Learning

• Supervised learning– training data classified already

• Unsupervised learning– acquire useful(?) knowledge without correctly

classified training data– category formation– scientific discovery

• We look at supervised learning only.

Page 7: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 7

Learnability

• Induction depends on there being useful generalisations possible in the representation language used.

• Learnability of concepts in a representation language is the ability to express the concept concisely.

• Random classifications are not learnable.

Page 8: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 8

Similarity-based learning

• Decision tree (rule) induction– induce a decision tree (set of rules) from the

training data.

• k-nearest neighbour classification– classify a new problem based on the k most

similar cases in the training data.

• Artificial Neural Networks– adjust weights in an NN to reduce errors on

training data.

Page 9: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 9

Decision Tree Induction

• Aim to induce a tree which – correctly classifies all training data– will correctly classify unseen cases

• ID3 algorithm assumes that the simplest tree that covers all the training examples is the best at unseen problems.– Leaving out extraneous tests should be good for

generalising.

Page 10: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 10

ID3

• Top-down construction– add selected tests under nodes– each test further partitions the samples– continue till each partition is homogeneous

• Information-theoretic test selection– maximise information gain

• ID3 works surprisingly well. Variations and alternatives exist.

Page 11: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 11

k-Nearest Neighbour Classification

• Data base of previously classified cases kept throughout.

• Category of target case decided by category of its k nearest neighbours.

• No inducing or training of a model.

• “Lazy” learning– work deferred to runtime– compare with neural networks - eager learners

Page 12: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 12

“Nearest” - distance/similarity

For query q and training set X (described by features F)compute d(x,q) for each x X, where

F

qxf

fff qxwd ),(),(

continuous is

and discrete is 1

and discrete is 0

),(

fqx

qxf

qxf

qx

ff

ff

ff

ff

and where

Page 13: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 13

k-NN and Noise

• 1-NN easy to implement– susceptible to noise

• a misclassification every time a noisy pattern retrieved

• k-NN with k 3 will overcome this

• Either – straight voting between the k examples or– weighted votes depending on “nearness” of

each example.

Page 14: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 14

K-NN vs. Decision Trees

• Decision trees test features serially.– If two cases don’t match on first feature tried

they don’t match at all.

• K-NN considers all features in parallel.

• For some tasks serial testing is OK, for others it’s not.

Page 15: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 15

Dimension reduction in k-NN

• Not all features required– noisy features a

hindrance

• Some examples redundant– retrieval time depends on

no. of examples

p features

q best features

n covering examples

m examples

Page 16: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 16

Condensed NN

100 examples2 categories

Different CNN solutions

Page 17: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 17

Feature weighting

• Feature weights– modify the effect of large continuous distance

values– allow some features to be treated as more

important than others • pull cases with important features in common closer

together.

Page 18: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 18

Feature weighting

• Introspective learning -

• Test training data on itself– For a correct retrieval

• increase weight of matching features (pull)

• decrease weight of un-matching features (pull)

– For an incorrect retrieval

• decrease weight of matching features (push)

• increase weight of un-matching features (push)

Pull

Push

Page 19: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 19

(Artificial) Neural Networks

• Decision tree induction builds a symbolic “causal” model from training data.

• k-NN builds no model.

• A neural network is a sub-symbolic, non-causal, distributed, “black box”, model built from training data.

• ANN output is continuous whereas a k-NN classifies into discrete classes.

Page 20: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 20

NN Prediction of Malignancy

• A. Tailor and co. paper describes a neural network which computes a probability of malignancy from age, morphological features, and sonographic data.

• Describes design and testing of the NN.

• Note intro to NNs in the Appendix

Page 21: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 21

ANN Advantages

• Particularly suited to pattern recognition– character, speech, image

• Suited to domains where there is no domain theory or model.

• Robust - Handle noisy and incomplete data well.

• Potentially fast. Parallel processing.

• Flexible and easy to maintain.

Page 22: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 22

ANN Problems

• Lack explanation

• Currently implemented in software mostly.

• Training times can be tedious.

• Need lots of training and test data.– True of similarity-based learning in general.

Page 23: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 23

ANN Processing Element (PE)

OUT = F(NET)OUT

X1

X2 • • • Xn

w1

w2

wn

Summation - gives PE’s activation level

Transfer function - modifies the activation level to produce a reasonable output value (e.g. 0-1) .

Page 24: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 24

Typical ANN Structure

PE

PE

Inputlayer

Hiddenlayer

Outputlayer

• There may be

– additional hidden layers.

– different topologies

– different connectivity

• Choosing ANN structure

– is based on problem and

– requires some expertise.

PE

PE

PE

PEPE

PE

Page 25: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 25

Learning/Training

• Aim to obtain desired outputs for each training example.

• Backpropagation is the most popular learning algorithm.– Initialise all weights associated with inputs to each PE.

– Present sample inputs to ANN.

– Compare ANN outputs with desired output.

– Alter weights to reduce the mean square error, and repeat.

• until the error is within some tolerance.

Page 26: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 26

Overfitting

Training time

Error

In-sample error

Generalisationerror

Too much training will result in a (k-NN or ANN) model that makes minimal errors on the training data (memorises), but no longer generalises well.

Beware.

Page 27: 12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.

12 March 1999 Dip HI KBS Module 27

ANN Development

Collect data

Separate into training and test sets

Define a network structure

Select a learning algorithm

Set parameters, values, weights

Transform data to network inputs

Start training, revise weights

Stop and test

Use the network for new cases.

Get more better data

Reseparate

Redefine structure

Select another algorithm

Reset

Reset