Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS...

Character Identification in Feature-Length Films Using Global Face-Name Matching

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009

Yi-Fan Zhang, Student Member, IEEE, Changsheng Xu, Senior Member, IEEE, Hanqing Lu, Senior Member, IEEE,

and Yeh-Min Huang, Member, IEEE

Outline

• Introduction• Face Clustering• Face-Name Association• Applications• Experiment• Conclusions

Introduction

• In a film, the interactions among the characters resemble them into a relationship network, which makes a film be treated as a small society.

• In the video, faces can stand for characters and co-occurrence of the faces in a scene can represent an interaction between characters.

• In the film script, the spoken lines of different characters appearing in the same scene also represents an interaction.

scene titlebrief description:

environment, actions

speaker namespoken line

Introduction

speaking face tracks

face affinity network

name affinity network

a graph matching method

an EMD-based measure of face track distance

leading characters & cliques

Since we try to keep as same as possible with the name statistics in the script, we select the speaking face tracks to build the face affinity network, which is based on the co-occurrence of the speaking face tracks.

Outline


Face Clustering

.....

.....

frame

face detection: detect faces on each frame of the video

face track (the same person):store face position, scale andthe start and end frame number of the track

video

video scene segmentation:1. The scene segmentation points can be inserted in

the boundary between two shots which have the high degree of discontinuity.

2. To align with the scene partition in the film script, we change discontinuity degree threshold to get the same number of scenes in the video with the script.

the scene segmentation point

scene 5

scene 6

scene 7

speaking face track detection:1. the mouth ROI is located2. SIFT3. normalized sum of absolute difference (NSAD)4. if a face track has more than 10% frames labeled as speaking, it will be determined as a speaking face track

Face Clusteringface representation by locally linear embedding(LLE):1. It is a dimensionality reduction technique.2. It project high dimensional face features into the embedding space which can still preserve the neighborhood relationship.

extract the dominant clusters:1. We employ spectral clustering to do clustering on all the faces in the LLE space.2. The number of clusters K is set by prior knowledge derived from the film script.

spectral clustering

k dominant clusters

Face Clusteringearth mover’s distance (EMD):It is a metric to evaluate the dissimilarity between two distributions.

,ij ij

i j

iji j

d f

EMD P Qf

1 1 2 21.67

3EMD

EMD similarity

measure face track distance by EMD:represent face track:

: is the cluster center : is the number of faces belonging to this cluster

1 1{( , ),..., ( , )}

m mp p p pP c w c w

𝑐𝑝𝑖𝑤𝑝 𝑖

3 45

61 2

7 89

dominant clusters:cluster1 cluster2 cluster3 cluster4 cluster5

face track P : 1 2 3 4 5 1 3, 2 , ,3P c c

1 1

1 1

( , )

m n

ij iji j

m n

iji j

d fEMD P Q

f

: the ground distance between cluster centers and : the flow between and

ijd

ipcjqc

ipcjqc

ijf

Face Clusteringconstrained K-Means Clustering:1. K-Means clustering is performed to group the scatted face tracks.2. The two face tracks which share the common frames cannot be clustered together.3. The target number of clusters on face tracks is the same as K we set in spectral clustering on the faces.4. We also ignore those characters whose spoken lines are less than three in the script.5. To clean the noise from the clustering results, a pruning method is employed in the next step.

speaking face track clusterscluster1 cluster2 cluster3 cluster4 cluster5

face track noise face track

cluster pruning: We refine the clustering results by pruning the

marginal points which have low confidence belonging to the current cluster.

: the EMD between the face track F and its cluster centerk : the number of K-nearest neighbors of F : the number of K-nearest neighbors which belong to the same cluster with F All the marginal points: We do a re-classification which incorporates the speaker voice features for enhancement.

: the likelihood of ‘s voice model for XX : the feature vector of the corresponding audio clip1. To clean noises, we set a threshold . 2. The face track will be classified into the cluster whose function score is maximal.

0( , )( ) D F FinkC F e

k

0,D F F

0F

ink

( , )( , ) ( | )Ck

k

D F F

k CS F C e P X

( | )kCP X kC

( , )kS F C

scoreTh

Outline


Face-Name Association

We use a name entity recognition software to extract every name in front of the spoken lines and the scene titles.

name ik m nO o

iko

name occurrence matrixm : the number of namesn : the number of scenes : the name count of the ith character in kth scene

name affinity matrixThe affinity value between two names is represented by their co-occurrence.

face affinity matrix

[ ]name ij m mR r

1

min( , )n

ij ik jkk

r o o

[ ]face ij m mR r

Face-Name Associationvertices matching between two graphs:The name affinity network and the face affinity network both can be represented as an undirected, weightedgraphs, respectively: , ,name n n nG V E W , ,face f f fG V E W

for each assignment , we can find a measurement on how well matching :

for each pair of assignments , where, , we can find a measurement on how compatible the two assignments are :

We use spectral matching method to find the final results of name-face association.

Face-Name AssociationSpectral matching method: It is commonly used for finding consistent correspondences between two sets of features.

A B

DC

1 2

43

A,B,C,DP 1,2,3,4Q

assignment , ' , where and ' Qa i i i P i

assignment

A,3 , B,1 , C,4 , D,2

, ' , , 'a i i b j j

M(a,a): It measures how well the feature i matches the feature i’.ex: M(A,3)=4, M(A,1)=1

M(a,b): It measures how well the edge (i,j) matches the edge (i’,j’).ex: M((A,3),(B,1))=4, M((A,3),(B,4))=0

Raleigh’s ratio theorem: is principal eigenvector of M corresponding to its largest eigenvalue.

Greedy algorithm:It is used for finding the solution to the correspondence problem.

A1

A2

A3

A4

B1

B2

B3

B4

C1

C2

C3

C4

D1

D2

D3

D4

A1

2 0 0 0 0 1 4 3 0 3 1 2 0 2 3 4

A2

0 3 0 0 1 0 3 1 3 0 2 4 2 0 4 2

A3

0 0 4 0 4 3 0 1 1 2 0 4 3 4 0 2

A4

0 0 0 1 3 1 1 0 2 4 4 0 4 2 2 0

B1

0 2 4 3 4 0 0 0 0 3 3 4 0 4 2 3

B2

2 0 3 1 0 3 0 0 3 0 4 2 4 0 3 3

B3

4 3 0 1 0 0 2 0 3 4 0 2 2 3 0 3

B4

3 1 1 0 0 0 0 3 4 2 2 0 3 3 3 0

C1

0 3 1 2 0 3 3 4 3 0 0 0 0 3 1 2

C2

3 0 2 4 3 0 4 2 0 2 0 0 3 0 2 4

C3

1 2 0 4 3 4 0 2 0 0 1 0 1 2 0 4

C4

2 4 4 0 4 2 2 0 0 0 0 4 2 4 4 0

D1

0 3 3 4 0 4 2 2 0 3 1 2 3 0 0 0

D2

3 0 4 2 4 0 3 3 3 0 2 4 0 4 0 0

D3

3 4 0 2 2 3 0 3 1 2 0 4 0 0 3 0

D4

4 2 2 0 2 3 3 0 2 4 4 0 0 0 0 2

1. The correspondence problem reduces now to finding the cluster C of assignments (i,i’) that maximizes the score

2. We can represent a cluster C by an indicator vector x, such that and zero otherwise.

3. the optimal solution

,

, ,a b C a C

S M a b M a a

,

, , T

a b C a C

S M a b M a a x Mx

* arg max Tx x Mx

A B

DC

1 2

43

A1 0

A2 0

A3 1

A4 0

B1 1

B2 0

B3 0

B4 0=

C1 0

C2 0

C3 0

C4 1

D1 0

D2 1

D3 0

D4 0

x

A1

A2

A3

A4

B1

B2

B3

B4

C1

C2

C3

C4

D1

D2

D3

D4

x

Outline


Applications

• Until now, we have associated a name to each speaking face track cluster. For the non-speaking face tracks, we can also classify them into the nearest speaking face track clusters depending on the EMD.

Character Relationship Mining• The relationship mining is conducted on the name affinity network.• The leading character can be considered as the one who has high

centrality in the name affinity network . The centrality of a character is defined as .

Applications

• For clique detection, we use agglomerative hierarchical clustering.

and are the numbers of the

characters in and

We classify the result cliques into dyad which has two members, triad which has three members and the large clique.

Applications

Character-Centered Browsing

Outline


Experiment

film information

speaking face track detection

Experiment

The higher the value of is , the more speaking face tracks

will be pruned.

Precision/recall curves of

face track clustering

confTh

Experiment

name-face association

relationship mining

Outline


Conclusions

• A graph matching method has been utilized to build name-face association between the name affinity network and the face affinity network.

• As an application, we have mined the relationship between characters and provided a platform for character-centered film browsing.

Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS...

Documents

Transcript of Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS...