Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 2. Learning Processes.
-
Upload
julian-price -
Category
Documents
-
view
382 -
download
6
Transcript of Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 2. Learning Processes.
Neural Networks 2nd Edition
Simon Haykin
柯博昌
Chap 2. Learning Processes
2
Learning vs. Neural Network
The neural network is stimulated by an environment.
The neural network undergoes changes in its free parameters as a result of this stimulation.
The neural network responds in a new way to the environment because of the changes that have occurred in its internal structure.
3
Error-Correction Learning
One or more Layers of
hidden neurons
Output Neuron
k
x(n)Input vectoryk(n)
dk(n)
- +
ek(n)=dk(n)-yk(n)
Multilayer feedforwardnetwork
Desired Response(Target Output)
Error Signal
Objective: Minimizing a cost function or index of performance,
nen k2
2
1
The step-by-step adjustment are continued until the system reaches a steady state.
4
Delta Rule (Widrow-Hoff Rule)
nxnenw jkkj
Let wkj(n) denote the value of synaptic weight wkj of neuron k excited by element xj(n) of the signal vector x(n) at time step n.
: a positive constant determining the rate of learning as we proceed from one step in the learning process to another. (learning-rate parameter)
nwnwnw kjkjkj 1
11 nwznw kjkj
In effect, wkj(n) and wkj(n+1) may be viewed as the old and new values of synaptic weight wkj, respectively.
5
Memory-based Learning
Def: All (or most) of the past experiences are explicitly stored in a large memory of correctly classified input-output examples.
Niii dx 1,
response desired ingCorrespond :d
vector Input :x
i
i
Without loss of generality, the desired response is restricted to be a scalar.
Ex: A binary pattern classification problemAssume there are two classes/hypotheses denoted C1 and C2
di=0 (or -
1)1for C1
for C2
Classification of a test vector xtest
Retrieving the training data in a “local neighborhood” of xtest
1. Nearest neighbor rule is a simple yet effective type of learning2. K-nearest neighbor classifier is to identify k patterns lying nearest to
xtest and use a majority vote to make classification.
6
Hebbian Learning
Hebb’s postulate of learning is the oldest and most famous of all learning rule.
– When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic changes take place in one or both cells (A, B).
Hebb’s rule was expanded:– If two neurons on either side of a synapse are activated simultaneously, t
hen the strength of that synapse is selectively increased.– Otherwise, if such two neurons are activated asynchronously, then that s
ynapse is selectively weakened or eliminated.– Such a synapse is called a Hebbian synapse.
Hebbian Synapse Characteristics– Time-dependent mechanism– Local mechanism– Interactive mechanism– Conjunctional or correlational mechanism
7
Mathematical Models of Hebbian Modifications
nxnyFnw jkkj ,
Let wkj denote a synaptic weight of neuron k with pre-synaptic and post-synaptic signals denoted by xj and yk, respectively.
: the rate of learning (a positive constant )
nxnynw jkkj (Activity product rule)Hebby’s hypothesis (the Simplest form):
Covariance hypothesis: yyxxw kjkj
slope=xj
xxslope j
Hebb’s hypothesis
Covariance hypothesis
Postsynaptic activity yk
Balance point yk
Maximum depression point
yxx j
0
wkj
Limitation of Hebby’s hypothesis:The repeated application of xj leads to an increase in yk and exponential growth that finally drives the synaptic connection into saturation.
No information will be stored in the synapse and selectivity is lost.
8
Competitive Learning
Characteristics: – The output neurons compete
among themselves to become active.
– Only a single output neuron is active at any one time.
– The neuron that wins the competition is called a winner-takes-all neuron.
Architectural Graph
If k is the winning neuron, it induced that vk must be the largest among all the neurons in the network for a specified input pattern x.
otherwise
kj j, all for vv ify jk
k 0
1
9
Boltzmann Learning
A stochastic learning algorithm derived from ideas rooted in statistical mechanism.
A neural network designed based on Boltzmann learning rule is called a Bolzmann machine.
The neurons constitute a recurrent structure, and operate in a binary manner (ex: +1, -1).
Energy Function: k)(j xxwEj k
jkkj 2
1
xj is the state of neuron j. jk means that none of the neurons has self-feedback. The machine operates by choosing a neuron at random.(A brief review of statistical mechanics is presented in Chapter 11)
10
Credit-Assignment Problem
The problem of assigning credit or blame for overall outcomes to each of the internal decisions.
For example: the error-correction learning is applied to a multilayer feedforward neural network.
(Presented in Chapter 4)
11
Learning with a Teacher (Supervised Learning)
12
Learning without a Teacher
Reinforcement LearningI/O mapping is performed through continued interaction with the environment to minimize a scaler index of performance.
Unsupervised Learning
No teacher or critic to oversee the learning process.
EnvironmentLearning System
Ex: Competitive learning
13
Learning TasksPattern Association
Autoassociation: A neural network stores a set of patterns by repeatedly presenting them to network. Then, the network is presented a partial description or distorted version of an original pattern.
Heteroassociation: An arbitrary set of input patterns is paired with another arbitrary set of output patterns.
xk: key pattern yk: memorized pattern
Pattern Association: xkyk, k=1, 2, …, qwhere q is the number of patterns stored in the network
In Autoassociation, yk=xk
In Heteroassociation, yk=xk
14
Learning TasksPattern Recognition
Def: A received pattern/signal is assigned to one of a prescribed number of classes.
Input patternx
Unsupervisednetwork for
featureextraction
Featurevector y Supervised
network forclassification
12
r
…(A)
(B)
15
Learning Tasks Function Approximation
Nonlinear input-output mapping
d=f(x)x: input vector d: output vectorf() is assumed to be unknown
Given a set of labeled examples: Niii dx 1, Requirement: Design a neural network to approximate this unknown
function f() such that F().||F(x)-f(x)||< for all x, where is a small positive number
System Identification Inverse System
16
Learning Tasks Control
Goal: Supply appropriate inputs to the plant to make its output y track reference d.
Error-correction algorithm needs the Jacobian matrix:
j
k
u
yJ
17
Learning Tasks Filtering and Beamforming
Filtering– Extract information from a set of noisy data.– Ex: Cocktail party problem.
Beamforming– A spatial form of filtering, which is used to distinguish betwe
en the spatial properties of a target signal and background noise.
– Ex: Echo-locating bats
18
Memory
xk2
xkm
......
wi1(k)
yki
xk1
wi2(k)
wim(k)
Signal-flow graph model of a linear neuron labeled i
Tkmkk xxx ,,, 21 kx Tkmkk yyy ,,, 21 ky
qkk ,...,2,1,)( kk xWy
mixkwym
jkjijki ,...,2,1,
1
mi
x
x
x
kwkwkwy
km
k
k
imiiki ,...,2,1,,,, 2
1
21
km
k
k
mmmm
m
m
km
k
k
x
x
x
kwkwkw
kwkwkw
kwkwkw
y
y
y
2
1
21
22221
11211
2
1
W(k)
W(k) is a weight matrix determined by input-output pair (xk,yk).
q
k
k1
WM qkkkk ,...,2,1,1 WMMMemory matrix M defines the overall connectivity between input and output layers.
19
Correlation Matrix Memory
q
k 1
ˆ TkkxyM
Pattern Input
Pattern Output
MatrixnCorrelatio
:
:
:ˆ
k
k
x
y
M
q21q21
T
Tq
T2
T1
q21
yyyYx,...,x,xX
YX
x
x
x
yyyM
,...,,
,...,,ˆ
where
qk ,...,2,1ˆˆ Tkk1kk xyMM (Recursion Form)
20
Recall
xk: a randomly selected key patterny: the yielded response
jxMy ˆ
m
jkk
m
k
m
k 111kj
Tkjj
Tjkj
Tkj
Tkk yxxyxxyxxxxyy
Let each of key patterns x1, x2, …, xq be normalized to have unit energy
q1,2,...,k Em
lk
,11
kTk
2kl xxx
m
jkk 1
, kjTkjjj yxxvvyy
Desired Response
ErrorVectorBecause
jk
jTk
jkxx
xxxx ,cos
kjkj yxxv
m
jkk 1
,cos jk ,0,cos jk xxIf
These key vectors are orthogonal. It means vj=0
But, the key patterns presented to an associative memory are neither orthogonal nor highly separated from each other.
21
Adaptation
If the operating environment is stationary (the statistical characteristics do not change with the time), the essential statistics of the environment can, in theory, be learned under the supervision of a teacher.
In general, the environment of interest is non-stationary. So, the neural network must adapt its free parameters to variations continuously in a real-time fashion. (Continuous Learning or Learning-on-the-fly)
Pseudo-stationary: The statistical characteristics of a non-stationary process change slowly enough over a window of short enough duration.
Dynamic Approach to Learning– Select a window short enough.– When a new data sample is received, update the window by dropping
the oldest data and shifting the remaining data by one time unit.– Use the updated data window to retrain the network.– Repeat the procedure on a continuing basis.