Single-Layer Perceptron Classifiers -...

52
Single-Layer Perceptron Classifiers Berlin Chen, 2002

Transcript of Single-Layer Perceptron Classifiers -...

Page 1: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

Single-Layer PerceptronClassifiers

Berlin Chen, 2002

Page 2: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

2

Outline

• Foundations of trainable decision-making networks to be formulated– Input space to output space (classification space)

• Focus on the classification of linearly separable classes of patterns– Linear discriminating functions and simple correction

function – Continuous error function minimization

• Explanation and justification of perceptron and delta training rules

Page 3: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

3

Classification Model, Features,and Decision Regions

• A pattern is the quantitative description of an object, event, or phenomenon– Spatial patterns: weather maps, fingerprints …– Temporal patterns: speech signals …

• Pattern classification/recognition– Assign the input data (a physical object, event, or

phenomenon) to one of the pre-specified classes (categories)

– Discriminate the input data within object population via the search for invariant attributes among members of the population

Page 4: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

4

Classification Model, Features,and Decision Regions (cont.)

• The block diagram of the recognition and classification system

Dimension reduction

A neural network for classification and for feature

extraction

Page 5: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

5

Classification Model, Features,and Decision Regions (cont.)

• More about Feature Extraction– The compressed data from the input patterns while

poses salient information– E.g.

• Speech vowel sounds analyzed in 16-channel filterbanks can provide 16 spectral vectors, which can be further transformed into two dimensions

– Tone height (high-low) and retraction (front-back)

• Input patterns to be projected and reduced to lower dimensions

Page 6: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

6

Classification Model, Features,and Decision Regions (cont.)

• More about Feature Extractiony

x

x’

y’

Page 7: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

7

Classification Model, Features,and Decision Regions (cont.)

• Two simple ways to generate the pattern vectors for cases of spatial and temporal objects to be classified

• A pattern classifier maps input patterns (vectors) in En

space into numbers (E1) which specify the membership( ) Rjij ,...,2 ,1 ,0 == x

Page 8: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

8

Classification Model, Features,and Decision Regions (cont.)

• Classification described in geometric terms

– Decision regions– Decision surfaces: generally, the decision surfaces for n-

dimensional patterns may be (n-1)-dimensional hyper-surfaces

( ) ,..., R, jji jo 21 ,Χ allfor , =∈= xx

The decision surfaces hereare curved lines

Page 9: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

9

Discriminant Functions• Determine the membership in a category by the

classifier based on the comparison of R discriminant functions g1(x), g2(x),…, gR(x)– When x is within the region Xk if gk(x) has the largest

value ( ) ( ) ( ) j,..., R, k, jk,ggki jk ≠=>= 21 for if 0 xxx

x1, x2,…., xp, ….,xP

P>>nAssume the classifier Has been designed

x1

x2

xn

g1

g2

gR

g1(x)

gR(x)

g2(x)

Page 10: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

10

Discriminant Functions (cont.)

• Example 3.1 ( ) ( ) ( )

( )( ) 2 class : 0

1 class : 022

21

21

<>

++=−=

xx

xxx

gg

xx-ggg

The decision surface does not uniquely specify the discriminant functions

The classifier that classifies patternsinto two classes or categories is called“dichotomizer”

Decision surface Equation:

“two” “cut”

Page 11: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

11

Discriminant Functions (cont.)

Page 12: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

12

Discriminant Functions (cont.)

[2,-1,0]

[-2,1,0]

(0,-2,0)

(1,0,0)[0,0,1]

[2,-1,1]

(0,-2,1)

(x-0,y+2, g1 -1)(2,-1,1)=02x-y-2+ g1 -1=0g1=-2x+y+3(x-0,y+2, g2 -1)(-2,1,1)=0-2x+y+2+ g2 -1=0g2=2x-y-1g=g1 -g2=0-4x+2y+4=0-2x+y+2=0

An infinite number of discriminant functions will yield

correct classification

(x-0,y+2, g1 -1)(2,-1,2)=02x-y-2+2g1 -2=0g1=-x+1/2y+2

(x-0,y+2, g2 -1)(-2,1,2)=0-2x+y+2+ 2g2 -2=0g2=x-1/2yg=g1 -g2=0-2x+y+2=0

Solution 1

Solution 2x

y

g

( ) [ ]

( ) [ ]

=

+

−=

2

12

2

11

1- 2

31 2

xx

g

xx

g

x

x

Page 13: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

13

Discriminant Functions (cont.)

( ) ( ) ( )xxx 21 ggg −=

Multi-class

Two-class

( )( ) 2 class : 0

1 class : 0<>

xx

gg

subtraction Sign examination

Page 14: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

14

Discriminant Functions (cont.)

The design of discriminator for this case is not straightforward.The discriminant functions may result as nonlinearfunctions of x1 and x2

Page 15: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

15

Bayes’ Decision Theory

• A decision-making based on both the posterior knowledge obtained from specific observation data and prior knowledge of the categories – Prior class probabilities – Class-conditioned probabilities

( ) iP i class , ∀ω

( ) ixP i class , ∀ω

( ) ( ) ( )( )

( ) ( )( ) ( )∑

=

===

1

maxargmaxargmaxarg j

jj

ii

i

ii

ii

i PxPPxP

xPPxP

xPkωωωωωω

ω

( ) ( ) ( )iii

ii

PxPxPk ωωω maxargmaxarg ==

Page 16: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

16

Bayes’ Decision Theory (cont.)

• Bayes’ decision rule designed to minimize the overall risk involved in making decision– The expected loss (conditional risk) when making

decision

• The overall risk (Bayes’ risk)

– Minimize the overall risk (classification error) by computing the conditional risks and select the decision

for which the conditional risk is minimum, i.e., is maximum (minimum-error-rate decision rule)

( ) ( ) ( ) ( )( )( )xP

xP

j, ij, i

xlxPxlxR

i

ijj

jij

jjii

ω

ω

ωδωωδδ

-1

10

, where,,

=

=

≠=

==

( )( ) ( ) ( ) xxdxxpxxRR sample afor decision selected the: , δδ∫∞

∞−=

iδ ( )xR iδ( )xP iω

Page 17: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

17

Bayes’ Decision Theory (cont.)

• Two-class pattern classification

Likelihood ratio or log-likelihood ratio:

( ) ( )( )

( )( )1

2

2

1

2

1

ωω

ωω

ω

ω

PP

xPxP

xl<>

=

Bayes’ Classifier

( ) ( ) ( ) ( ) ( )1221 logloglogloglog2

1

ωωωωω

ω

PPxPxPxl −<>

−=

( ) ( ) ( ) ( )2211

2

1

ωωωωω

ω

PxPPxP<>

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )22221111 , ωωωωωω PxPxPxgPxPxPxg ≅=≅=

( ) ( ) ( )( ) ( ) ( ) ( )( ) ( ) ( ) ( )dxPxPdxPxP

PRxPPRxPRxPRxPerrorp

RR 1122

112221

1221

21

,,

ωωωω

ωωωωωω

∫∫ +=

∈+∈=

∈+∈=

Classification error:

Page 18: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

18

Bayes’ Decision Theory (cont.)

• When the environment is multivariate Gaussian, the Bayes’ classifier reduces to a linear classifier– The same form taken by the perceptron– But the linear nature of the perceptron is not

contingent on the assumption of Gaussianity

[ ]( )( )[ ][ ]( )( )[ ] ΣµXµX

µXΣµXµX

µX

=−−

==−−

=

t

t

E

EE

E

22

22

11

11

: Class

: Class

ω

ω

( )( )

( ) ( )

−−−= − µxΣµx

Σx 1

21

2 21exp

2

1 tn

ω

( ) ( )21

21 == ωω PP

Assumptions

Page 19: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

19

Bayes’ Decision Theory (cont.)

• When the environment is Gaussian, the Bayes’classifier reduces to a linear classifier (cont.)

( ) ( ) ( )( ) ( ) ( ) ( )

( ) ( )

21

21

21

logloglog

11

121

21

21

21

211

1

21

b

PPl

ttt

tt

+=

−+−=

−−+−−−=

−=

−−−

−−

wx

µΣµµΣµxΣµµ

µxΣµxµxΣµx

xxx ωω

( ) 0log2

1

ω

ω

<>

+=∴ bl wxx

Page 20: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

20

Bayes’ Decision Theory (cont.)

• Multi-class pattern classification

Page 21: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

21

Linear Machine and Minimum Distance Classification

• Find the linear-form discriminant function for two-class classification when the class prototypes are known

• Example 3.1: Select the decision hyperplane that contains the midpoint of the line segment connecting center point of two classes

Page 22: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

22

Linear Machine and Minimum Distance Classification (cont.)

0)(21)(

0)2

()(

21

2221

2121

=−+−

=+

−−

xxxxx

xxxxx

t

t

The dichotomizer’s discriminant function g(x):

( )21

221

21

1

21

where,01

asTaken

xx

xxw

xw

−=

−=

=

+

+

n

n

w

w2

21 xx +

Augmentedinput pattern

It is a simple minimum-distance classifier.

Page 23: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

23

Linear Machine and Minimum Distance Classification (cont.)

• The linear-form discriminant functions for multi-class classification – There are up to R(R-1)/2 decision hyperplanes for R

pairwise separable classes

xx

x xx

x

x

o o oo oo o

Δ Δ

Δ Δ

Δ

Δ

Δ

o

o oo oo o

oxx

x xx

x

x

ΔΔ

Δ Δ

Δ

Δ

Δ

Some classes may not be contiguous

Page 24: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

24

Linear Machine and Minimum Distance Classification (cont.)

• Linear machine or minimum-distance classifier– Assume the class prototypes are known for all classes

• Euclidean distance between input pattern x and the center of class i, xi :

• Minimizing is equal to

maximizing

– Set the discriminant function for each class i to be:

iti

ti

ti xxxxxxxx +−=− 22

( ) ( )it

ii xxxxxx −−=−

iti

ti xxxx

21

( ) iti

tiig xxxxx

21

−=

( )itini

ii

w xx

xw

21

1, −=

=

+( ) where,

1

1,

=

+

xwx

ni

ii w

g

( ) ywx tiig =

The same for all classes

Page 25: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

25

Linear Machine and Minimum Distance Classification (cont.)

This approach is also called correlation classification

An 1 as the n+1’th component of the input pattern

( ) iti

tiig xxxxx

21

−= ( ) ywx tiig =

Page 26: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

26

Linear Machine and Minimum Distance Classification (cont.)

• Example 3.2

−=

−−=

−=

255 5-

,5.145 2

,522

10 321 www

05.10107:027315:

05.3778:

2123

2113

2112

=−+−=++−=−+

xxSxxS

xxS

( )( )( ) 2555

5.145252210

213

212

211

−+−=−−=−+=

xxgxxgxxxg

xx

( ) iti

tiig xxxxx

21

−=

12S13S

23S

Page 27: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

27

Linear Machine and Minimum Distance Classification (cont.)

• If R linear discriminant functions exist for a set of patterns such that

– The classes are linearly separable

( ) ( )j i,..., R, ,..., R,j, i

i,gg ji

≠==

∈>

,2121

Classfor xxx

Page 28: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

28

Linear Machine and Minimum Distance Classification (cont.)

Page 29: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

29

Linear Machine and Minimum Distance Classification (cont.)

(a) 2x1-x2+2=0, decision surface is a line(b) 2x1-x2+2=0, decision surface is a plane(c) x1=[2,5], x2=[-1,-3]=>The decision surface for minimum distance classifier

(x1-x2)t x+1/2 (||x2||2-||x1||2)t=03x1+ 8x2-19/2=0

(d)

x1

x2

(0,0)(-1,0) (0,2)

x1

x2

(0,0)(-1,0) (0,2)

x3

x1

x2

(0,0)(-1,0) (0,2)

(19/6,0)

(19/16,0)

Page 30: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

30

Linear Machine and Minimum Distance Classification (cont.)

• Examples 3.1 and 3.2 have shown that the coefficients (weights) of the linear discriminant functions can be determined if the a priori information about the sets of patterns and their class membership is known

Page 31: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

31

Linear Machine and Minimum Distance Classification (cont.)

• The example of linearly non-separable patterns

Page 32: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

32

Linear Machine and Minimum Distance Classification (cont.)

x1+x2+1=0

-x1-x2+1=0TLU#2

TLU#1x1

x2

-1

-1

1

1

-1

-1-1

TLU#211

1

(1,1)

(-1,-1)

(-1, 1)

(1, -1)

o1

o2

(1,1)(-1, 1)

(1, -1) o1+o2-1=0

o1

o2

Page 33: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

33

Discrete Perceptron Training Algorithm- Geometrical Representations

• Examine the neural network classifiers that derive/training their weights based on the error-correction scheme

( ) ywy tg =

Vector Representationsin the Weight Space

Class 1:

Class 2:

0>yw t

0<yw t

Augmentedinput pattern

Page 34: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

34

Discrete Perceptron Training Algorithm- Geometrical Representations (count.)

• Devise an analytic approach based on the geometrical representations – E.g. the decision surface for the training pattern y1

If y1 in Class 1:

y1 in Class 2

( ) 11 yyww =∇ t

11 yww c+=′

y1 in Class 1

If y1 in Class 2:

11 yww c−=′

c (>0) is the correction increment (is two times of the learning constant introduced before)

Weight Space

Weight Space

c controls the size of adjustment

Gradient(the direction ofsteep increase)

Page 35: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

35

Discrete Perceptron Training Algorithm- Geometrical Representations (count.)

Weight adjustments of three augmented training pattern y1, y2, y3 , shown in the weight space

- Weights in the shaded region are the solutions

- The three lines labeled are fixed during training

Weight Space

23

12

11

CCC

∈∈∈

yyy

Page 36: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

36

Discrete Perceptron Training Algorithm- Geometrical Representations (count.)

• More about the correction increment c– If it is not merely a constant, but related to the current

training pattern

yyw t

p1

= ( )

0 because ,

0

2

11

1

>==

cc

ct

t

t

t

y

ywyyyw

yyw

m

yy

ywy 2

1t

c =⇒

How to select the correction increment based on the dislocates of w1 and the corrected weight vector w

Page 37: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

37

Discrete Perceptron Training Algorithm- Geometrical Representations (count.)

• For fixed correction rule with c=constant, the correction of weights is always the same fixed portion of the current training vector– The weight can be initialized at any value

• For dynamic correction rule with c dependent on the distance from the weight (i.e. the weight vector) to the decision surface in the weight space– The initial weight should be different from 0

yww c±=′ or ( )[ ]yywwwww

tdc sgn−=∆

∆+=′

yy

ywy 2

1t

c =⇒

Page 38: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

38

Discrete Perceptron Training Algorithm- Geometrical Representations (count.)

• Dynamic correction rule with c dependent on the distance from the weight

yy

y

ywy

y

yw

t

t

c

c

1

2

1

λ

λ

=

=

Page 39: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

39

Discrete Perceptron Training Algorithm- Geometrical Representations (count.)

• Example 3.3

=

13

3y

=

11

1y

−=

1 5.0

2y

=12

4y24

13

22

11

CCCC

∈∈∈∈

yyyy

( )[ ] jjkt

kk dc yyww sgn

2−=∆

What if ?-> interpreted as a mistakeand followed by a correlation

0=jkt yw

Page 40: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

40

Continuous PerceptronTraining Algorithm

• Replace the TLU (Threshold Logic Unit) with the sigmoid activation function for two reasons:– Gain finer control over the training procedure– Facilitate the differential characteristics to enable

computation of the error gradient

( )www E∇−= ηˆ

learning constant error gradient

Page 41: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

41

Continuous PerceptronTraining Algorithm (cont.)

• The new weights is obtained by moving in the direction of the negative gradient along the multidimensional error surface

Page 42: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

42

Continuous PerceptronTraining Algorithm (cont.)

• Define the error as the squared difference between the desired output and the actual output

( )221 odE −=

( )[ ] ( )[ ]22

21

21or netfdfdE t −=−= yw

( ) ( )[ ]( )

( ) ( ) ( )

( )

( )

( )

( ) ( )yw

w

netfod

wnet

wnetwnet

netfod

wE

wEwE

E

netfdE

nn

′−−=

∂∂

∂∂∂

′−−=

∂∂

∂∂∂∂

=∇

−∇=∇

++

1

2

1

1

2

1

2

.

...

21

Page 43: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

43

Continuous PerceptronTraining Algorithm (cont.)

• Bipolar Continuous Activation Function

• Unipolar Continuous Activation Function

( ) ( )netnetf

⋅−+=

λexp11

( ) ( )( )[ ]

( )[ ]{ } ( )222 11

exp1exp2 onetf

netnetnetf −=−⋅=⋅−+⋅−

⋅=′ λλλλλ

( ) ( )yww oood −−⋅⋅+= 1ˆ λη

( ) ( ) 1exp1

2−

⋅−+=

netnetf

λ

( ) ( )( )[ ]

( ) ( )[ ] ( )oonetfnetfnetnetnetf −⋅=−⋅=⋅−+⋅−⋅

=′ 11exp1exp

2 λλλλλ

( )( )yww 2121ˆ ood −−⋅+= λη

Page 44: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

44

Continuous PerceptronTraining Algorithm (cont.)

• Example 3.3 ( ) ( ) 1exp1

2−

−+=

netnetf

=

13

3y

−=

1 5.0

2y

=

11

1y

=12

4y

Page 45: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

45

Continuous PerceptronTraining Algorithm (cont.)

• Example 3.3 Trajectories started from fourarbitrary initial weights

Total error surface

Page 46: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

46

Continuous PerceptronTraining Algorithm (cont.)

• Treat the last fixed component of input pattern vector as the neuron activation threshold

Page 47: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

47

Continuous PerceptronTraining Algorithm (cont.)

• R-category linear classifier using R discrete bipolar perceptrons– Goal: The i-th TLU response of +1 is indicative of

class i and all other TLU respond with -1

( )yww iiii odc −⋅+=21ˆ

ij,..,R,jdd ji ≠=−== ,21for ,1 ,1

For “local representation”

Page 48: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

48

Continuous PerceptronTraining Algorithm (cont.)

• Example 3.5

Page 49: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

49

Continuous PerceptronTraining Algorithm (cont.)

• R-category linear classifier using R continuous bipolar perceptrons

( )( ),...,R,i

ood iiiii

21for

121ˆ 2

=

−−⋅+= yww λη

ij,..,R,jdd ji ≠=−== ,21for ,1 ,1

Page 50: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

50

Continuous PerceptronTraining Algorithm (cont.)

• Error function dependent on the difference vector d-o

Page 51: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

51

Bayes’ Classifier vs. Percepron

• Perceptron operates on the promise that the patterns to be classified are linear separable (otherwise the training algorithm will oscillate), while Bayes’ classifier assumes the (Gaussian) distribution of two classes certainly do overlap each other

• The perceptron is nonparametric while the Bayes’classifier is parametric (its derivation is contingent on the assumption of the underlying distributions)

• The perceptron is simple and adaptive, and needs small storage, while the Bayes’ classifier could be made adaptive but at the expanse of increased storage and more complex computations

Page 52: Single-Layer Perceptron Classifiers - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/PastCourses/Neural2002Fall/ch3-SingleLayer... · 3 Classification Model, Features, and Decision

52

Homework

• P3.5, P3.7, P3.9, P3.22