黃文中 2009-05-04 1. Preview 2 3 The Saliency Map is a topographically arranged map that...

35
黃黃黃 2009-05-04 1

Transcript of 黃文中 2009-05-04 1. Preview 2 3 The Saliency Map is a topographically arranged map that...

黃文中2009-05-04

1

Preview

2

3

The Saliency Map is a topographically arranged map that represents visual saliency of a corresponding visual scene.

4

Two kinds of stimuli type: Bottom-up

Depend only on the instantaneous sensory input Without taking into account the internal state of

the organism

Top-down Take into account the internal state Such as goals the organisms has at this time,

personal history and experiences, etc

5

6

Nine spatial scales are created using dyadic Gaussian pyramids.

Each features is computed by a set of linear “center-surround” operations akin to visual receptive fields.

NormalizationAcross-scale combination into three

“conspicuity maps.”Linear combinations to create saliency map.Winner-take-all

7

8

9

A Method of Calculating Image Saliency and of Optimizing

Efficient Distribution of Image Windows

10

In the learning process, feature vectors are extracted at all of positions of numerous learning images. Then PCA is applied to them to produce principal basis vectors, which are memorized in the dictionary.

In the searching step, the feature vector is extracted from each position of the input image, and it is expanded on the basis memorized in the dictionary. Then the coefficients in the expansion are analyzed by another PCA, and the residual when the feature vector is expanded on the principal basis in the second PCA is output as the saliency.

11

Let u be a d = 3 × m × n dimensional vector which consists of R, G and B pixel values of m × n pixels in a window of m × n size.

The Principal Component Analysis is applied to a set of N feature vectors u1, u2, · · ·, uN.

When the image size is W × H, the number of feature vectors is N = K × (W − m + 1) × (H − n + 1).

Let u be an average of feature vectors and let C be a covariance matrix defined as

12

Let λ1 ≥ λ2 ≥ · · · ≥ λN be eigenvalues of the covariance matrix C, and let ξ1, ξ2, · · · , ξd be corresponding eigenvectors.

These vectors are assumed to form an orthonormal basis, and among them S vectors ξ1, ξ2, · · · , ξS are memorized in the dictionary as the bases of principal components.

13

A window of m × n size is scanned in the same way as in the learning step, and a feature vector vi {i = 1, 2, · · ·,M} is extracted at each point, where M = (W−m+1)×(H−n+1) is a number of points.

Then the vector Vi = vi − 〈 u 〉 is expanded on the ”memorized” basis as

14

Then the set of coefficients αi1, αi2, · · · , αiS is regarded as a S dimensional vector αi, and PCA is applied again to them.

The average of α1,α2, · · · ,αM is calculated as

Then the eigenvalues μ1 ≥ μ2 ≥ ··· ≥ μS and the orth-normal eigenvectors φ1,φ2, · · · ,φS are obtained just as in the learning step. 15

Now we define another orthonormal basis vector

The residue, which represents the saliency, is obtained by

16

17

First initial distribution of L points is determined randomly with uniform probability.

Let (xm1, ym1), (xm2, ym2), · · ·, (xmL, ymL) be the distribution in the mth iteration.

Feature vectors are extracted from these L points, and the values of saliency sm1, sm2, · · ·, smL are calculated in the same way as previous discuss.

Then such points are selected as have greater saliency than a predetermined value th.

Let H be the number of the selected points, and let (xm1, ym1 ), (xm2, ym2 ), · · ·, (xmH, ymH ) be the selected points.

18

Next the potential energy E(x, y) is calculated as

Then (m+1)th distribution is determined stochastically according to Gibbs distribution

Z is the partition function defined as

19

20

The model integrates:A bottom-up mechanism for extracting features

to obtain salient informationA top-down perceptual mechanism for

perceiving face features such as face form and face color.

21

22

The secondary visual areas deal with form and color of an object, 3-D position and motion information.

The face-selective cells in the infero-temporal (IT) area contain complex shape coding information.

The neurons in area V4 respond best to specific colors of objects, irrespective of lighting conditions.

23

24

25

Suppose that the saliency map is one of the results of redundancy reduction in our brain.

Use ICA to model the role of the visual cortex because the ICA is the best way to reduce the redundancy.

26

27

Eri is obtained by the convolution between the r-th channel of input image (Ir) and the i-th filters (ICsri) obtained by the ICA learning as shown below:

The feature map, Eri represents the influences of the three channel images on each independent component.

A saliency map is obtained byA salient location P is the maximum summation value

in a specific window of a saliency map as shown below:

28

29

Use an auto-associative multilayer perceptron(AAMLP) with 4-layers which are mapping layer, bottleneck layer, de-mapping layer, and output layer.

An auto-associative neural network is basically a neural network whose input and target vectors are the same.

30

Modeling the top-down perception mechanism in the IT and V4 areas using AAMLP by which characteristic information, such as face form and face color, is trained and memorized in the connections of the artificial neurons in AAMLP.

Also, a human being can perceive some important characteristic information for a specific object rather than very detailed information.

To mimic this role as well as computational efficiency, we extracted some eigenvectors with large eigenvalues using a principal component analysis (PCA) for extracting some important features of a specific object.

To perceive a face related information, we mimic the retrieval of face-related information from AAMLP using correlation computation between input and output of the AAMLP.

31

Let F denotes an auto-associative mapping function, and xj and yj indicate an input and output vector, respectively. Then the function F is usually trained to minimize the following mean square error given by

32

From the top-down processing, we get the face shape feature map and the face color feature map.

The synchronization for a biological binding process of different features is modeled by the summation of pixel values in the face form feature map, the face color feature map, and the bottom-up SM.

33

34

[1] T. Toriu and S. Nakajima, "A Method of Calculating Image Saliency and of Optimizing Efficient Distribution of ImageWindows," Innovative Computing, Information and Control, 2006. ICICIC '06. First International Conference on, vol. 1, pp. 290-293, 2006.

[2] S. Ban, M. Lee and H. Yang, "A face detection using biologically motivated bottom-up saliency map model and top-down perception model," Neurocomputing, vol. 56, pp. 475-480, 1. 2004.

[3] A. J. Bell and T. J. Sejnowski, "The 「 independent components 」 of natural scenes are edge filters," Vision Res., vol. 37, pp. 3327-3338, 12. 1997.

[4] S. Park, K. An and M. Lee, "Saliency map model with adaptive masking based on independent component analysis," Neurocomputing, vol. 49, pp. 417-422, 12. 2002.

35