Wang midterm-defence

Post on 15-Jan-2015

91 views 3 download

description

This is the slides used for my mid-term defense.

Transcript of Wang midterm-defence

Combining Motion Information with Appearance Information and Utilizing of Mutual Information Encoded Among Image Features of the Same Object for Better Detection Performance

Zhipeng Wang

Directed by: Pro. Katsushi Ikeuchi, Dr. Masataka Kegesawa, and Dr. Shintaro Ono

Computer Vision Laboratory, The University of Tokyo

運動情報を外観情報との結びつけおよび 同一物体の画像特徴によりエンコードされた相互情報の利用 ーー検出性能の向上のため

Outline

Background and state-of-the-art detection methods

Our work (goal, challenges, related work, method,

results, contribution, and conclusion) Section 1, 2: utilizing motion information Section 3, 4: utilizing mutual information

Conclusion Publication list

Background

Object detection from images: One of the basic perceptual skills in human Plays an important role in machine vision area

Two primary categories

Sliding-window methods Use classifiers to answer whether each sub-

image contains a target object Hough transform based methods

Infer about object status in a bottom-up manner from detected image features

Better detection performance

Representative methods Image features: HOG, sift

Classifiers Machine learning methods: deep learning

Efficient search techniques Branch and bound[1]

[1] Lampert, C.H.; Blaschko, M.B.; Hofmann, T.; , "Efficient Subwindow Search: A Branch and Bound Framework for Object Localization," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.31, no.12, pp.2129-2142, Dec. 2009

Our work towards better detection performance

Combining motion information with appearance information

Using mutual information encoded among the features of the same object

By using extra information, we expect to achieve better detection performance

Our work

Combining motion information with appearance information

1. Common Fate Hough Transform

2. Emergency Telephone Indicator Detection in Tunnel Environment

1. Common Fate Hough Transform

Goal and challenges

Where and what label is each object in the scene

Challenges: Near objects Similar different-

class objects

Common Fate Hough Transform

Related work

Motion for object detection Background subtraction Optical flow[1]

Combination of appearance and motion information Mainly for tracking

[1] G. Brostow and R. Cipolla. Unsupervised bayesian detection ofindependent motion in crowds. In CVPR, pages I: 594–601, 2006.

Hough transform

On the current image: Detect keypoints and extract image feature Find the best matched codes from a trained

codebook Each keypoint votes for

object center and label

B. Leibe and B. Schiele. Interleaved object categorization and segmentation.In BMVC, pages 759–768, 2003.

Hough transform

Discrete to continuous Blurring each vote Summing up all votes from

all object parts

Common Fate Hough Transform

The common fate principle

One of the gestalt laws Elements with the same motion tend to be

perceived as one unit

Common Fate Hough Transform

Common fate Hough transform

Motion analysis[1]

Keypoint tracking (KLT tracker)

Clustering the trajectories

[1] G. Brostow and R. Cipolla. Unsupervised bayesian detection ofindependent motion in crowds. In CVPR, pages I: 594–601, 2006.

Common fate Hough transform

The weight of each vote is related to the support it gains from the motion group

Common fate Hough Transform

Common fate Hough transform

Common Fate Hough Transform

Inference 1. Find the highest peak in the confidence space 2. Exclude the object parts belonging to the

highest peak 3. If the Hough image is not empty, go to 1.

Inference

Common Fate Hough Transform

Common Fate Hough Transform

Inference

Common Fate Hough Transform

Inference

Common Fate Hough Transform

Inference

Common Fate Hough Transform

Inference

Common Fate Hough Transform

Inference

Common Fate Hough Transform

Inference

Common Fate Hough Transform

Inference

Experimental results

Dataset 720×576 401 continuous frames 633 different-class ground truth bounding boxes

on 79 frames

Common Fate Hough Transform

Results

Common Fate Hough Transform

Results

Common Fate Hough Transform

Results

Common Fate Hough Transform

Results

Common Fate Hough Transform

Results

Common Fate Hough Transform

Results

Common Fate Hough Transform

Results

Common Fate Hough Transform

Results

Common Fate Hough Transform

Results

Common Fate Hough Transform

Results

Common Fate Hough Transform

Result comparison

[1] O. Barinova, V. Lempitsky, and P. Kohli. On detection of multiple object instances using hough transforms. In CVPR, pages 2233–2240, 2010.

Contribution

A detection method with better detection performance than the CVPR’10 method

A successful combination of motion and appearance information for detection

A successful attempt to combine human perception rules into detection method in the computer vision area

Common Fate Hough Transform

Conclusion

Motion largely improves detection results by being effectively combined with appearance information

The method is not efficient then we propose

Common Fate Hough Transform

2. Detection of Emergency Telephone Indicators by Fusion of Motion Information with Appearance Information

Goal

Emergency Telephone Indicator Detection

For vehicle positioning Detecting emergency telephone indicators

Installed at known location Infrared cameras (far infrared) installed on vehicle’s

top

Challenges

Challenges: Noisy objects Real-time requirement

Emergency Telephone Indicator Detection

Method Method: a two-step method

Detect, verify, and cluster keypoints Verify keypoint clusters by appearance and

motion information

Emergency Telephone Indicator Detection

Keypoint Detection

Keypoint Verification

Keypoint Clustering

Keypoint Cluster Verificationby Appearance

Keypoint Cluster Tracking

Keypoint Cluster Verification by Motion

Pipeline

Pipeline

Detect keypoints Averagely sampling Setting intensity thresholds

160-190 (0-255)

Emergency Telephone Indicator

Pipeline

Verify keypoints Intensity histogram Building a mixture model

using k-means

Emergency Telephone Indicator Detection

Pipeline

Cluster keypoints Build a minimum spanning tree (Euclidean

distance) Split the tree

Emergency Telephone Indicator Detection

Pipeline

Verify keypoint clusters by appearance Adaboost machine

Trained from positive and negative examples Intensity histogram

Emergency Telephone Indicator Detection

Pipeline

Keypoint cluster tracking No occlusion Connect each detection

response to its nearest trajectory (appearance, scale, and time gap)

Emergency Telephone Indicator Detection

Pipeline

Keypoint cluster verification by motion Fit each trajectory as a straight line Use the significance of the fitting as criteria

Emergency Telephone Indicator Detection

Benefits of the pipeline

The system is real-time More time consuming steps dealing with

fewer instances Original image: 105 pixels Keypoint detection: 104 points Keypoint verification: 103 keypoints Keypoint clustering: 102 keypoints Less than 10 keypoint clusters

Emergency Telephone Indicator Detection

Experimental results

Real-time (34 fps) A laptop with Intel Core2 Duo 2.8GHz

processors

Experimental results

Emergency Telephone Indicator Detection

Contribution

A specialized real-time detection method on cluttered data

A successful combination of motion and appearance information for detection

The meaning of this research: a potential applicable solution for vehicle positioning in tunnel environment

Emergency Telephone Indicator Detection

Conclusion

Motion information plays an important role in the detecting process

Emergency Telephone Indicator Detection

Future work

We collect new data using new camera and plan to improve the pipeline on the new collected data

Emergency Telephone Indicator Detection

Two methods utilizing motion information are proposed

When motion information is not available, we then go to

Our work

Utilizing of Mutual Information Encoded Among Image Features of the Same Object

3. Pyramid Match Score for Object Detection

4. Detection of Object With In-plane Rotations

3. Pyramid Match Score for Object Detection

Motivation

Sliding window based methods: Often [1] ignore positional information of each

image feature Methods based on Hough transform

Ignore overall information of each object Intention of this method:

An efficient method to utilize positional information of each image feature and overall information gained from the whole object

[1] Lazebnik, S.; Schmid, C.; Ponce, J.; , "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories," Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on , vol.2, no., pp. 2169- 2178, 2006

Pyramid Match

Pyramid Match is a method to find best one-one match between two sets of points Here gives a 2D example Real data is 14D

Appearance information:12D Position information: 2D

Pyramid match

Divide the space from fine to coarse If two points are in the same grid under a division,

the are considered as match and excluded

Pyramid match

The distance of two matched points No need to be calculated Assigned

According to the size of the grid, when they are found match

1/4 * 2 1/2 * 2 1*2

Pyramid Match Score

Pyramid match between two sets Each sub-image: considered as a 14D point set

(appearance: 12, position: 2) A “super template” contains all 14D points of

training images

Pyramid Match Score

Find the near best match between each sub-image of a test image and the “super template”

The match score (distance) is used as the confidence that the sub-image contains a target object

Results on UIUC cars

Results

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.1

0.2

0.3

0.4

0.5

0.6

3

10

7

av

back

Precison

Det

ecti

on

rat

e

Comments

The results are not good enough The method seems promising, and in future

we will improve the method

This mutual information can be further used

4 Detection of Object With In-plane Rotations

About the related work

Training examples of all rotation directions,

not robust

or very time-consuming

Related work

Two neural networks to deal with one sub-image One for estimation of the

sub-image’s rotation angle The sub-image is rotated so

that the potential face is upright

The other for decision of whether one object exists

H. Rowley, S. Baluja, and T. Kanade. Rotation invariant neural network-based face detection. In CVPR, pages 38–44, 1998.

Related work

Local feature method Infer rotation based on the gradient direction of

sift feature Graph-based methods

Consider object as graph, and change the graph

Detection of Object With In-plane Rotations

Idea

Suppose we have a codebook trained from objects without in-plane rotations

Detection of Object With In-plane Rotations

Idea

With proof omitted For two or more votes from different keypoints,

there exists one and only one rotation angle which will minimize the difference of the voted centers

If all votes are good estimations, we can expect to be a good estimation the rotation angle of the object

Detection of Object With In-plane Rotations

To do

Propose a method capable of detecting objects with in-plane

Evaluate the robustness of

Detection of Object With In-plane Rotations

Conclusion

Motion information combined with appearance information Distinguish target objects with noisy object Enhance detection rate

Mutual information encoded among the image features of the same object We expect to get a method capable of utilizing

positional and overall information We expect to get a method capable of detecting

objects with in-plane rotations

Publication list

[1] Zhipeng Wang; Jinshi Cui; Hongbin Zha; Kegesawa, M.; Ikeuchi, K.; , "Object detection by common fate Hough transform," Pattern Recognition (ACPR), 2011 First Asian Conference on , vol., no., pp.613-617, 28-28 Nov. 2011

[2] Zhipeng Wang; Kagesawa, M.; Ono, S.; Banno, A.; Ikeuchi, K.; , "Emergency light detection in tunnel environment: An efficient method," Pattern Recognition (ACPR), 2011 First Asian Conference on , vol., no., pp.628-632, 28-28 Nov. 2011

[3] Zhipeng Wang; Jinshi Cui; Hongbin Zha; Masataka Kegesawa; Shintaro Ono; Katsushi Ikeuchi;, “Detection by Motion-based Grouping of Object Parts,” International Journal of ITS Research (submitted)

[4] Zhipeng Wang; Masataka Kegesawa; Shintaro Ono; Atsuhiko Banno; Takeshi Oishi; Katsushi Ikeuchi;, “Detection of Emergency Telephone Indicators Using

Infrared Cameras for Vehicle Positioning in Tunnel Environment,” ITS World Congress 2013 (submitted)

Thank you very much!