Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 2. Learning Processes.

Neural Networks 2nd Edition

Simon Haykin

柯博昌

Chap 2. Learning Processes

2

Learning vs. Neural Network

The neural network is stimulated by an environment.

The neural network undergoes changes in its free parameters as a result of this stimulation.

The neural network responds in a new way to the environment because of the changes that have occurred in its internal structure.

3

Error-Correction Learning

One or more Layers of

hidden neurons

Output Neuron

k

x(n)Input vectoryk(n)

dk(n)

- +

ek(n)=dk(n)-yk(n)

Multilayer feedforwardnetwork

Desired Response(Target Output)

Error Signal

Objective: Minimizing a cost function or index of performance,

nen k2

2

1

The step-by-step adjustment are continued until the system reaches a steady state.

4

Delta Rule (Widrow-Hoff Rule)

nxnenw jkkj

Let wkj(n) denote the value of synaptic weight wkj of neuron k excited by element xj(n) of the signal vector x(n) at time step n.

: a positive constant determining the rate of learning as we proceed from one step in the learning process to another. (learning-rate parameter)

nwnwnw kjkjkj 1

11 nwznw kjkj

In effect, wkj(n) and wkj(n+1) may be viewed as the old and new values of synaptic weight wkj, respectively.

5

Memory-based Learning

Def: All (or most) of the past experiences are explicitly stored in a large memory of correctly classified input-output examples.

Niii dx 1,

response desired ingCorrespond :d

vector Input :x

i

i

Without loss of generality, the desired response is restricted to be a scalar.

Ex: A binary pattern classification problemAssume there are two classes/hypotheses denoted C1 and C2

di=0 (or -

1)1for C1

for C2

Classification of a test vector xtest

Retrieving the training data in a “local neighborhood” of xtest

1. Nearest neighbor rule is a simple yet effective type of learning2. K-nearest neighbor classifier is to identify k patterns lying nearest to

xtest and use a majority vote to make classification.

6

Hebbian Learning

Hebb’s postulate of learning is the oldest and most famous of all learning rule.

– When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic changes take place in one or both cells (A, B).

Hebb’s rule was expanded:– If two neurons on either side of a synapse are activated simultaneously, t

hen the strength of that synapse is selectively increased.– Otherwise, if such two neurons are activated asynchronously, then that s

ynapse is selectively weakened or eliminated.– Such a synapse is called a Hebbian synapse.

Hebbian Synapse Characteristics– Time-dependent mechanism– Local mechanism– Interactive mechanism– Conjunctional or correlational mechanism

7

Mathematical Models of Hebbian Modifications

nxnyFnw jkkj ,

Let wkj denote a synaptic weight of neuron k with pre-synaptic and post-synaptic signals denoted by xj and yk, respectively.

: the rate of learning (a positive constant )

nxnynw jkkj (Activity product rule)Hebby’s hypothesis (the Simplest form):

Covariance hypothesis: yyxxw kjkj

slope=xj

xxslope j

Hebb’s hypothesis

Covariance hypothesis

Postsynaptic activity yk

Balance point yk

Maximum depression point

yxx j

0

wkj

Limitation of Hebby’s hypothesis:The repeated application of xj leads to an increase in yk and exponential growth that finally drives the synaptic connection into saturation.

No information will be stored in the synapse and selectivity is lost.

8

Competitive Learning

Characteristics: – The output neurons compete

among themselves to become active.

– Only a single output neuron is active at any one time.

– The neuron that wins the competition is called a winner-takes-all neuron.

Architectural Graph

If k is the winning neuron, it induced that vk must be the largest among all the neurons in the network for a specified input pattern x.

otherwise

kj j, all for vv ify jk

k 0

1

9

Boltzmann Learning

A stochastic learning algorithm derived from ideas rooted in statistical mechanism.

A neural network designed based on Boltzmann learning rule is called a Bolzmann machine.

The neurons constitute a recurrent structure, and operate in a binary manner (ex: +1, -1).

Energy Function: k)(j xxwEj k

jkkj 2

1

xj is the state of neuron j. jk means that none of the neurons has self-feedback. The machine operates by choosing a neuron at random.(A brief review of statistical mechanics is presented in Chapter 11)

10

Credit-Assignment Problem

The problem of assigning credit or blame for overall outcomes to each of the internal decisions.

For example: the error-correction learning is applied to a multilayer feedforward neural network.

(Presented in Chapter 4)

11

Learning with a Teacher (Supervised Learning)

12

Learning without a Teacher

Reinforcement LearningI/O mapping is performed through continued interaction with the environment to minimize a scaler index of performance.

Unsupervised Learning

No teacher or critic to oversee the learning process.

EnvironmentLearning System

Ex: Competitive learning

13

Learning TasksPattern Association

Autoassociation: A neural network stores a set of patterns by repeatedly presenting them to network. Then, the network is presented a partial description or distorted version of an original pattern.

Heteroassociation: An arbitrary set of input patterns is paired with another arbitrary set of output patterns.

xk: key pattern yk: memorized pattern

Pattern Association: xkyk, k=1, 2, …, qwhere q is the number of patterns stored in the network

In Autoassociation, yk=xk

In Heteroassociation, yk=xk

14

Learning TasksPattern Recognition

Def: A received pattern/signal is assigned to one of a prescribed number of classes.

Input patternx

Unsupervisednetwork for

featureextraction

Featurevector y Supervised

network forclassification

12

r

…(A)

(B)

15

Learning Tasks Function Approximation

Nonlinear input-output mapping

d=f(x)x: input vector d: output vectorf() is assumed to be unknown

Given a set of labeled examples: Niii dx 1, Requirement: Design a neural network to approximate this unknown

function f() such that F().||F(x)-f(x)||< for all x, where is a small positive number

System Identification Inverse System

16

Learning Tasks Control

Goal: Supply appropriate inputs to the plant to make its output y track reference d.

Error-correction algorithm needs the Jacobian matrix:

j

k

u

yJ

17

Learning Tasks Filtering and Beamforming

Filtering– Extract information from a set of noisy data.– Ex: Cocktail party problem.

Beamforming– A spatial form of filtering, which is used to distinguish betwe

en the spatial properties of a target signal and background noise.

– Ex: Echo-locating bats

18

Memory

xk2

xkm

......

wi1(k)

yki

xk1

wi2(k)

wim(k)

Signal-flow graph model of a linear neuron labeled i

Tkmkk xxx ,,, 21 kx Tkmkk yyy ,,, 21 ky

qkk ,...,2,1,)( kk xWy

mixkwym

jkjijki ,...,2,1,

1

mi

x

x

x

kwkwkwy

km

k

k

imiiki ,...,2,1,,,, 2

1

21

km

k

k

mmmm

m

m

km

k

k

x

x

x

kwkwkw

kwkwkw

kwkwkw

y

y

y

2

1

21

22221

11211

2

1

W(k)

W(k) is a weight matrix determined by input-output pair (xk,yk).

q

k

k1

WM qkkkk ,...,2,1,1 WMMMemory matrix M defines the overall connectivity between input and output layers.

19

Correlation Matrix Memory

q

k 1

ˆ TkkxyM

Pattern Input

Pattern Output

MatrixnCorrelatio

:

:

:ˆ

k

k

x

y

M

q21q21

T

Tq

T2

T1

q21

yyyYx,...,x,xX

YX

x

x

x

yyyM

,...,,

,...,,ˆ

where

qk ,...,2,1ˆˆ Tkk1kk xyMM (Recursion Form)

20

Recall

xk: a randomly selected key patterny: the yielded response

jxMy ˆ

m

jkk

m

k

m

k 111kj

Tkjj

Tjkj

Tkj

Tkk yxxyxxyxxxxyy

Let each of key patterns x1, x2, …, xq be normalized to have unit energy

q1,2,...,k Em

lk

,11

kTk

2kl xxx

m

jkk 1

, kjTkjjj yxxvvyy

Desired Response

ErrorVectorBecause

jk

jTk

jkxx

xxxx ,cos

kjkj yxxv

m

jkk 1

,cos jk ,0,cos jk xxIf

These key vectors are orthogonal. It means vj=0

But, the key patterns presented to an associative memory are neither orthogonal nor highly separated from each other.

21

Adaptation

If the operating environment is stationary (the statistical characteristics do not change with the time), the essential statistics of the environment can, in theory, be learned under the supervision of a teacher.

In general, the environment of interest is non-stationary. So, the neural network must adapt its free parameters to variations continuously in a real-time fashion. (Continuous Learning or Learning-on-the-fly)

Pseudo-stationary: The statistical characteristics of a non-stationary process change slowly enough over a window of short enough duration.

Dynamic Approach to Learning– Select a window short enough.– When a new data sample is received, update the window by dropping

the oldest data and shifting the remaining data by one time unit.– Use the updated data window to retrain the network.– Repeat the procedure on a continuing basis.

Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 2. Learning Processes.

Documents

Transcript of Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 2. Learning Processes.