Nhận dạng đối tượng trong ảnh tĩnh sử dụng phép học không giám sát mô hình...

Hội nghị toàn quốc về Điều khiển và Tự động hoá - VCCA-2011

VCCA-2011

Nhận dạng đối tượng trong ảnh tĩnh sử dụng phép học không giám sát mô

hình CABM

Generic objects recognition in static images based on unsupervised learning of

a color-based active basis model

T. T. Quyen Bui1,2

and Keum-Shik Hong2

1Institute of Information Technology, Hanoi, Vietnam 2Pusan National University, Busan, 609-735, Korea

E-mails: {quyenbt, kshong}@pusan.ac.kr

Tóm tắt: Bài báo trình bày bài toán nhận dạng lớp

đối tượng trong ảnh tĩnh sử dụng quá trình học không

giám sát mô hình active basis model dựa trên đặc

trưng màu (CABM). Ở đây, quá trình nhận dạng đối

tượng sử dụng không gian màu CIELAB. Các sóng

Gabor wavelet của ảnh màu LAB, các đặc trưng màu

như năng lượng quang phổ cục bộ của mức sáng, mỗi

thành phần màu và gradient màu đều được sử dụng

trong cả hai quá trình học không giám sát và nhận

dạng đối tượng qua mẫu. Hơn nữa, tính hiệu quả của

phép học không giám sát mô hình CABM cho quá

trình nhận dạng đối tượng trong các ứng dụng thị giác

và những đóng góp của nó cho bài toán nhận dạng lớp

đối tượng cũng được giới thiệu.

Abstract: In this paper, the problem of detecting

generic objects in static images utilizing unsupervised

learning of a color-based active basis model (CABM)

is addressed. In here, the CIELAB color space is used,

Gabor wavelets of LAB color images as well as color

features, such as the local power spectrum of each of

the color cues and gradient features, are considered by

both unsupervised learning and inference algorithms.

Moreover, the effectiveness of unsupervised learning

of the CABM for practical object recognition and its

significant improvement in recognizing generic

objects are demonstrated.

Ký hiệu

Ký hiệu Ý nghĩa

),,,( yxG

a general form of Gabor wavelets

IDic

a dictionary of Gabor wavelet

elements of an image I .

),,( *** baL the brightness, red-green channel,

and yellow-blue channel,

respectively

D the domain of an image lattice

N

the number of different orientations

N

the total number of scales

}{ mI a set of training color images

LabmI

a representation of

mI in the

CIELAB color space

mIDic

a dictionary of Gabor wavelets

elements of LabmI

N the number sketches

NC the number of clusters

M the number of training images

}{ CkT a common template

}{ , kmT

a deformable template of image LabmI

}{Cj

kT

a common template of the cluster

j -th

)

,,(

,

,,

km

kmkm yx

perturbations of the basis element

of image LabmI

db and b

the bounds

kmc , coefficients

mU

an unexplained residual image

abCG

a full color gradient

)(

the normalizing constant of an

exponential family model

(.)u a candidate function of image Lab

uI

),( yxQ template matching score

a set of activities for a basis element

(.)sf a sigmoid function

Chữ viết tắt

ABM Active basis model

CABM Color-based active basis model

EM Expectation maximization

LPS Local power spectrum

1. Introduction Object recognition is one of challenges in vision

computer applications. There are many approaches to

solve this problem such as the use of interested points,

color of objects, shape of objects, input templates of

objects, etc. In this paper, we present the detection of

generic objects based on unsupervised learning of a

color-based active basis model.

Among various existing approaches of object

recognition, methods using a deformable template

have proven to be markably successful and the

research results are presented in [1-9]. Recently, Wu

et al. [10] proposed an active basis model (ABM) for

the detecting of objects. An ABM, a shared sketch

519


VCCA-2011

algorithm, and a computational architecture of sum-

max maps are utilized for representing, learning, and

recognizing a deformable template, respectively.

However, the ABM completely ignores all color

features in its detection of objects because the share

sketch algorithm applies a grey-value local energy to

find a common template together with its deformable

versions from training images.

A color-based active basis model (CABM) is

proposed and supervised learning of the CABM for

object recognition presented in [11] is an extension

problem of the ABM. Both supervised learning and

inference algorithm use image features based on color

such as color-based local power spectrum and color

gradients. Experimental results in [11] showed the

usefulness of the CABM as long as its significant

improvement in practical object recognition.

However, supervised leaning is only used when

objects in the training images have the same pose and

appear at the same location under the same scale. In a

situation where objects may appear at different

unknown locations, orientations, and scales in the

training images, unsupervised learning based on an

expectation-maximization (EM) type algorithm is

used. The color space CIELAB is utilized and as a

result, three opponent colors, ),,( *** baL , which are

the brightness, red-green channel, and yellow-blue

channel, respectively, are obtained per pixel. The

color-based LPS (LPSs of the components *L , *a ,

and *b , respectively, and a conjunction of these

LPSs) and color gradients are used in both the

learning and inference algorithms. The EM learning

process iterates between recognition and supervision.

First, the E-step involves using the current template to

recognize objects in the training images. Second, the

M-step re-learns the template based on the imputed

locations, orientations, and scales from aligned

images by the shared sketch algorithm.

The contribution of the paper is the research of

unsupervised learning of the CABM for generic

objects recognition. Moreover, experimental results

showed how much each of the color cues, gradient

features, as well as the EM clustering, help with

object recognition. In this paper, we use images from

two widely used benchmark image sets: the PASCAL

VOC 2009 database [12] and the Weizmann horse

dataset [13].

The paper is organized as follows: the CABM is

overviewed briefly in Section 2. Moreover,

unsupervised learning and inference algorithms are

discuss in details in this section. Experimental results

are demonstrated in Section 3. Finally, conclusions

are drawn in Section 4.

2. A color-based active basis model We refer the readers to reference [11] for more

information of the Gabor filters, color images

features, supervised learning of the CABM, etc., in

the detection of generic objects.

2.1 The CABM model

Let ),,,( yxG be a general form of Gabor

wavelets, IDic

be a dictionary of Gabor wavelet

elements of an image, I . The dictionary, IDic , is

defined as follows:

,}}1,...,0,{,),(

,),(),,(

),,,({

2/,,,0,,,

NiN

iDyx

yxgyxg

yxGDicI

(1)

where ),(0,,, yxg and ),(2,,, yxg are the

symmetric and anti-symmetric Gabor components,

respectively, D is the domain of an image lattice,

N is the number of different orientations, and N is

the total number of scales. In this paper, 15N , and

7.0 .

The proposed scheme of the CABM is shown in

Fig. 1 in which unsupervised learning based on the

EM-type algorithm is used. As presented in [11],

instead of utilizing color components RGB of natural

images, we consider color components *L , *a , and *b in the CIELAB color space. Given a set of training

color images, },...,1,{ MmIm , let LabmI be the

representation of mI in the CIELAB color space.

LetmIDic be the dictionary of Gabor wavelets

elements of LabmI and N be the number sketches that

are picked. Let

},...,1),,,,({ NkyxGT kkkC

k be

a common template that is shared by training images.

The location of each elements, CkT , can be shifted in

its normal direction with the permitted displacement

of the location, ),( yx , and the orientation of each

element can be shifted by the permitted value, .

The set of activities for a basis element is defined as

follows:

]},[],,[

,cos,sin:),,{(

bbbbd

dydxyx

dd

, (2)

where db and

b are the bounds for the permitted

displacements of the location and orientation,

respectively. Let kmT ,{

},...,1),,,,( ,,, NkyyxxG kmkkmkkmk

be the specific template that only describes image, LabmI , where kmT , is chosen from

mIDic and

),,,( ,,, kmkmkm yx Nk ,...,1 , is perturbations of

the basis element of image , LabmI . Each image, Lab

mI ,

can be represented as follows:

N

kmkmkm

Labm UTcI

1,, , Mm ,...,1 , (3)

where Nkc km ,...,1,, , are coefficients, and mU is an

unexplained residual image.

520


VCCA-2011

Fig. 1. A scheme of the color-based active basis model in

which unsupervised learning based on the EM-type is

utilized for object recognition.

2.2 Inference algorithm

Given a common template of an object, to detect

the object in images, we adopted the template

matching in the ABM, which determines a deformed

template of an object by using the log-likelihood

score. For instance, suppose that an object in a new

color image, uI , has to be found. Let Lab

uI be the

representation of uI in the CIELAB color space. The

inference algorithm is carried out in the following

manner where the parameters, },...,1,{ Nkk , are

the weights and the function, )( , is the

normalizing constant of an exponential family model

[10], (.)sf is a sigmoid function and its value

increases from 0 to the saturation constant, , and

(.)u is a candidate function of the image, LabuI . If

only the color-based LPS is taken into account, the

value of (.)u equals the normalization value of the

color-based LPS of the image. If utilizing the

combination of the LPS and the gradient-based

features in a single detector, (.)u is determined by

using the logistic model [14].

2.3 Unsupervised learning

Given a sample of response vectors (a set of

training images), we easily find a dictionary of

predictor or repressors ( }{ , kmT ) by using a linear

regression, in order that each response vector (image LabmI ) can be depicted as a linear combination of a

small number of repressors that are picked from the

dictionary as in equation (3). For this reason, the

output of unsupervised learning is the deformed

template, },...,1,{ , NkT km , which describes the

image, LabmI , by a small number of outline edges in

the texture of objects and the common templates,

},...,1,,...,1,{ NCjNkT Cjk of clusters where NC

is the number of clusters that is set before the

learning is performed. Templates are learned by using

the maximum likelihood method. However, unlike

supervised learning, unsupervised learning gives us

more than one common template.

If the objects appear at unknown orientations, the

E-step recognizes the orientation of the object in the

image, LabmI ; then, for a given orientation, Lab

mI is

changed to a new image LabmI

~, so that we get a new

set of images in which the image LabmI

~ becomes

aligned with each other. The M-step is carried out for

relearning templates from the aligned images through

supervised learning.

If the objects appear at different locations in the

training images, the locations of the templates should

be predicted during learning. The shift of the template

is transferred to the shift of the image and the latter

shift gives us a set of aligned images. Because we can

use a given template to recognize and locate the

},...,1,{ MmI Labm

Image features

-Color-based LPS

-Gradient-based

features

Color components

),,( *** baL

Gabor filter

Unsupervised learning

(EM algorithm)

Inference algorithm Testing images

Image features

Dictionaries

},...,1,{ MmDicmI

Deformed templates Common templates

Training images

},...,1,{ MmIm Inference algorithm:

1. For every pixel ),( yx of the image LabuI , compute the

template matching score,

N

k k

kykyyxkxxusfyx

k

yxQ

1 )(log

),,,(),,(

max

),(

2. Select ),(

,maxarg)ˆ,ˆ( yxQ

yxyx . For Nk ,...,1 , record

),,ˆ,ˆ(),,(

maxarg

),,(

kykyyxkxxuyx

kkykx

3. Return the location )ˆ,ˆ( yx and the deformed template of an

object, },...,1),,,ˆ,ˆ(,{ NkkkkykyykxkxxkuT .

521


VCCA-2011

object in each image, LabmI , by the inference

algorithm (see Section 2.2), the predictive distribution

of the unknown location of a template within the

image lattice of LabmI is determined; then, the

anticipation of the complete data log-likelihood is

obtained in the E-step. Next, we find the maximum of

the complete data log-likelihood by using the shared

sketch algorithm in the M-step. In other words,

supervised learning is executed for relearning

templates from the aligned images as in [11].

Nevertheless, unlike the previous ABM from EM-

type algorithm, color-based features, such as the LPSs

and the full color gradient, are taken into account in

our method through supervised learning and inference

algorithms.

By using the conjunction of color-based features,

varied versions of the CABM are acquired. Let the

notation, ABM EMl , denote the CABM in

which unsupervised learning and the color-based LPS

are used, i.e., },,,,,{ LabLbLabaLl . For instance,

ABM EMLb corresponds to the CABM in which

the unsupervised learning and the sum of the LPSs of

the components, *L and *b , are used. Let the

notation, ABM EMgl , denote the CABM in

which unsupervised learning based on the EM-type

algorithm and a combination of the color-based LPS

and a full color gradient are utilized. In here, the full

color gradient, abCG , is the sum of the two marginal

gradients of the *a and *b components. Fig. 2 depicts

an example of the deformed templates of images. The

objects are a horse, a cow, and a butterfly. The

number of sketches corresponding to the horse, the

cow, and the butterfly are 50, 42, and 55, respectively.

Viewing the second row through the bottom row, the

first method of learning that uses grey features only

[5], and the subsequent method corresponds to the

unsupervised learning in which the color-based LPSs

are used, i.e., },,,,,{ LabLbLabaLl , respectively.

3. Experimental results The area under the curve (AUC) is usually the best

discriminator [15], when we have a number of

receiver operating characteristic (ROC) curves to

compare. The AUC value has a range from 0 to 1,

therefore the smaller , AUC1 , value the

method has, the better the method is in object

recognition. In this section, we illustrate the

usefulness of unsupervised learning of the CABM for

object recognition and use to evaluate/compare the

performance of the method in recognizing objects.

In this experiment, the number of iterations is

three; hence, three common templates are created

after learning. Table 1 shows the value comparison

of ABMEMl , ABM EMgl , and ABM.

Here, the object is the butterfly, the image size is 150

x 110, the number of training images is 65, the

number of elements is 42, and the number of cluster is

LPS

Horse

N = 50

(150x125)

Butterfly

N = 42

(150x110)

Cow

N = 55

(170x125)

grey

featu

re

[5]

L

a

b

La

Lb

Lab

Fig. 2. Examples of deformable templates (represented by

yellow pixels).

Table 1. A comparison of the values of

ABMEMl , ABM EMgl , and the ABM

for butterfly recognition. Here, 65M , 42N , the

size is 150x110, and 3NC .

Method AUC1

T1 T2 T3 All

Wu et al. [10] 0.0381 0.0490 0.0415 0.0376

ABM

EM

l

L 0.0327 0.0302 0.0343 0.0282 a 0.0789 0.0938 0.0815 0.0778

b 0.0617 0.0625 0.0610 0.0602

La 0.0405 0.0329 0.0317 0.0311

Lb 0.0403 0.0325 0.0312 0.0305

Lab 0.0414 0.0336 0.0341 0.0332

ABM

EM

g

l

L 0.0260 0.0192 0.0130 0.0121 a 0.0612 0.0694 0.0574 0.0572

b 0.0490 0.0445 0.0418 0.0413

La 0.0301 0.0248 0.0267 0.0247

Lb 0.0302 0.0248 0.0260 0.0242

Lab 0.0290 0.0227 0.0254 0.022

3. The columns named “Template i ” ( }3,2,1{i )

522


VCCA-2011

and “All” respectively show the AUCs of the separate

use of template i and of the usage of all the templates

in object recognition. In the case of all templates, each

and “All” respectively show the AUCs of the separate

use of template i and of the usage of all the templates

in object recognition. In the case of all templates, each

template is utilized to detect and then sketch an object

in an image in turn.

Table 2. A comparison of the values of

ABMEMl , ABM EMgl , and the ABM

for horse recognition. Here, 84M , 45N , the

size is 150x125, and 3NC .

Method AUC1

T1 T2 T3 All

Wu et al. [10] 0.0362 0.0543 0.0456 0.0361

ABM

EM

l

L 0.0285 0.0538 0.0333 0.0178 a 0.0441 0.0448 0.0725 0.0406

b 0.0455 0.0531 0.0483 0.0439

La 0.0336 0.0447 0.0542 0.0336

Lb 0.027 0.049 0.0415 0.0241

Lab 0.0353 0.0454 0.0371 0.0335

ABM

EM

g

l

L 0.0229 0.0491 0.0283 0.014 a 0.0424 0.0427 0.0608 0.0387

b 0.0434 0.0501 0.0455 0.0399

La 0.0296 0.0428 0.0407 0.0279

Lb 0.0248 0.0448 0.0387 0.02

Lab 0.0285 0.0396 0.0333 0.0253

The performance in object recognition is improved

when the color-based LPS is used with

},,,{ LabLbLaLl . Moreover, the value of

ABM EMgl with regard to the utilization of

all the templates is always smaller than the those of

ABMEMl and the ABM when

},,,{ LabLbLaLl . If only the LPS of either of the

components, *a or *b , is utilized in unsupervised

learning, the quality of object recognition is worse

because the common templates achieved cannot

represent all the necessary edges of generic objects.

Besides, the value of ABM EMLa is

approximately equal to that of ABM EMLb .

This shows that the use of a combination of *L and

*a is similar to the utilization of the conjunction of *L and *b in both learning and inference algorithms.

In addition, among all the various versions of the

CABM, ABM EMgL consistently yielded the

best performance for the detection of objects. Testing

with the others objects such as a horse, a pigeon,…

experimental results are archived similarly.

4. Conclusions The paper presented unsupervised learning of the

CABM in generic object recognition. The learning

based on the EM-type is discussed in details.

Experimental results achieved are compared with

previous research results. It shows that the use of the

CABM and unsupervised learning based on the EM

algorithm contributed a significant improvement for

the detection of generic objects. The learning is able

to discover not only general shapes of objects but also

parts of objects. For this reason, the future research is

to improve the flexible ability of unsupervised

learning of the CABM for detecting generic objects

that may appear in differences of viewpoints or only

parts appearing in images.

References

[1] Coughlan, J.; Yuille. A.; English. C.; Snow , D.:

“Efficient deformable template detection and

localization without user initialization,”

Computer Vision and Image Understanding, 78

(3) (2000) 303-319.

[2] Cootesm, T.; Edwards, G.; Taylor, C.: Active

appearance models, IEEE Trans. on Pattern

Analysis and Machine Intelligence, 23 (6)

(2001) 681-685.

[3] Crandall, D.; Felzenszwalb, P.; Huttenlocher,

D.: Spatial priors for part-based recognition

using statistical models, Proceedings of the

IEEE Computer Society Conference on

Computer Vision and Pattern Recognition, San

Diego, CA, USA, June 20-26, 2005, vol. 1, pp.

10-17.

[4] Felzenszwalb, P. F.; McAllester, D.: “The

generalized A* architecture,” Journal of

Artificial Intelligent Research, vol. 29, pp. 153-

190, 2007.

[5] Amit, Y.; Trouve, A.: POP: Patchwork of parts

models for object recognition, International

Journal of Computer Vision, 75 (2) (2007) 267-

282.

[6] Zhu, S.; Mumford, D.: A stochastic grammar of

images, Foundations and Trends in Computer

Graphic and Vision, 2 (4) (2007) 259-362.

[7] Leibe, B.; Leonardis, A.; and Schiele, B.:

Robust object detection with interleaved

categorization and segmentation, International

Journal of Computer Vision, 77 (1) (2008) 259-

289.

[8] Si, Z.; Gong, H.; Wu, Y. N.; Zhu S. C.:

“Learning mixed image templates for object

recognition,” IEEE Conference on Vision and

Pattern Recognition, pp. 272-279, 2009.

[9] Felzenszwalb, P. F.; Girshick, R. B.;

McAllester, D.; Ramanan, D.: Object detection

with discriminatively trained part based models,

IEEE Trans. on Pattern Analysis and Machine

Intelligence, 32 (9) (2010) 1627-1645.

[10] Wu, Y. N.; Si, Z. Z.; Gong H.; Zhu S. C.:

“Learning active basis model for object

detection and recognition,” International Journal

523


VCCA-2011

of Computer Vision, 2010, doi: 10.1007/s11263-

009-0287-0.

[11] Bui, T. T. Q.; Hong, K. S.: “Supervised

Learning of a Color-Based Active Basis Model

for Object Recognition,” Proc. of the Second Int.

Conf. on Knowledge and Systems Engineering

KSE, Hanoi, Vietnam, 7-9 Oct. 2010, pp. 69-74.

[12] Everingham, M.; Van-Gool, L.; Williams, C. K.

I.; Winn, J.; Zisserman, A.: “The PASCAL

visual object classes challenge 2009

(VOC2009), results." [Online]. Available:

"http://www.pascalnetwork.org/challenges/VOC

/voc2009/workshop/index.html.

[13] Borenstein, E.; Sharon, E.; Ullman, S.:

“Combining top-down and bottom-up

segmentation,” IEEE Computer Society

Conference on Computer Vision and Pattern

Recognition Workshops, pp. 46-54, 2004.

[14] Martin, D. R.; Fowlkes, C. C.; Malik, J.:

Learning to detect natural image boundaries

using local brightness, color, and texture cues,

IEEE Trans. on Pattern Analysis and Machine

Intelligence, 26 (5) (2004) 530-549.

[15] Ce, M.: Basic principles of ROC analysis,

Seminars in Nuclear Medicine, 8 (4) (1978)

283-298.

T. T. Quyen Bui received her

B.S. and M.S. degrees in Hanoi

University of Technology,

Vietnam, in 2001 and 2006,

respectively. She is currently a

Ph.D. program student in the

School of Mechanical

Engineering, Pusan National

University, Korea. Her research interests include

robotics, vision systems, and navigation of

autonomous vehicles.

524

Nhận dạng đối tượng trong ảnh tĩnh sử dụng phép học không giám sát mô hình...

Documents

Transcript of Nhận dạng đối tượng trong ảnh tĩnh sử dụng phép học không giám sát mô hình...