Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS...
-
Upload
brooke-mosley -
Category
Documents
-
view
212 -
download
0
Transcript of Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS...
Character Identification in Feature-Length Films Using Global Face-Name Matching
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009
Yi-Fan Zhang, Student Member, IEEE, Changsheng Xu, Senior Member, IEEE, Hanqing Lu, Senior Member, IEEE,
and Yeh-Min Huang, Member, IEEE
Outline
• Introduction• Face Clustering• Face-Name Association• Applications• Experiment• Conclusions
Outline
• Introduction• Face Clustering• Face-Name Association• Applications• Experiment• Conclusions
Introduction
• In a film, the interactions among the characters resemble them into a relationship network, which makes a film be treated as a small society.
• In the video, faces can stand for characters and co-occurrence of the faces in a scene can represent an interaction between characters.
• In the film script, the spoken lines of different characters appearing in the same scene also represents an interaction.
scene titlebrief description:
environment, actions
speaker namespoken line
Introduction
speaking face tracks
face affinity network
name affinity network
a graph matching method
an EMD-based measure of face track distance
leading characters & cliques
Since we try to keep as same as possible with the name statistics in the script, we select the speaking face tracks to build the face affinity network, which is based on the co-occurrence of the speaking face tracks.
Outline
• Introduction• Face Clustering• Face-Name Association• Applications• Experiment• Conclusions
Face Clustering
.....
.....
frame
face detection: detect faces on each frame of the video
face track (the same person):store face position, scale andthe start and end frame number of the track
video
video scene segmentation:1. The scene segmentation points can be inserted in
the boundary between two shots which have the high degree of discontinuity.
2. To align with the scene partition in the film script, we change discontinuity degree threshold to get the same number of scenes in the video with the script.
the scene segmentation point
scene 5
scene 6
scene 7
speaking face track detection:1. the mouth ROI is located2. SIFT3. normalized sum of absolute difference (NSAD)4. if a face track has more than 10% frames labeled as speaking, it will be determined as a speaking face track
Face Clusteringface representation by locally linear embedding(LLE):1. It is a dimensionality reduction technique.2. It project high dimensional face features into the embedding space which can still preserve the neighborhood relationship.
extract the dominant clusters:1. We employ spectral clustering to do clustering on all the faces in the LLE space.2. The number of clusters K is set by prior knowledge derived from the film script.
spectral clustering
k dominant clusters
Face Clusteringearth mover’s distance (EMD):It is a metric to evaluate the dissimilarity between two distributions.
,ij ij
i j
iji j
d f
EMD P Qf
1 1 2 21.67
3EMD
EMD similarity
measure face track distance by EMD:represent face track:
: is the cluster center : is the number of faces belonging to this cluster
1 1{( , ),..., ( , )}
m mp p p pP c w c w
𝑐𝑝𝑖𝑤𝑝 𝑖
3 45
61 2
7 89
dominant clusters:cluster1 cluster2 cluster3 cluster4 cluster5
face track P : 1 2 3 4 5 1 3, 2 , ,3P c c
1 1
1 1
( , )
m n
ij iji j
m n
iji j
d fEMD P Q
f
: the ground distance between cluster centers and : the flow between and
ijd
ipcjqc
ipcjqc
ijf
Face Clusteringconstrained K-Means Clustering:1. K-Means clustering is performed to group the scatted face tracks.2. The two face tracks which share the common frames cannot be clustered together.3. The target number of clusters on face tracks is the same as K we set in spectral clustering on the faces.4. We also ignore those characters whose spoken lines are less than three in the script.5. To clean the noise from the clustering results, a pruning method is employed in the next step.
speaking face track clusterscluster1 cluster2 cluster3 cluster4 cluster5
face track noise face track
cluster pruning: We refine the clustering results by pruning the
marginal points which have low confidence belonging to the current cluster.
: the EMD between the face track F and its cluster centerk : the number of K-nearest neighbors of F : the number of K-nearest neighbors which belong to the same cluster with F All the marginal points: We do a re-classification which incorporates the speaker voice features for enhancement.
: the likelihood of ‘s voice model for XX : the feature vector of the corresponding audio clip1. To clean noises, we set a threshold . 2. The face track will be classified into the cluster whose function score is maximal.
0( , )( ) D F FinkC F e
k
0,D F F
0F
ink
( , )( , ) ( | )Ck
k
D F F
k CS F C e P X
( | )kCP X kC
( , )kS F C
scoreTh
Outline
• Introduction• Face Clustering• Face-Name Association• Applications• Experiment• Conclusions
Face-Name Association
We use a name entity recognition software to extract every name in front of the spoken lines and the scene titles.
name ik m nO o
iko
name occurrence matrixm : the number of namesn : the number of scenes : the name count of the ith character in kth scene
name affinity matrixThe affinity value between two names is represented by their co-occurrence.
face affinity matrix
[ ]name ij m mR r
1
min( , )n
ij ik jkk
r o o
[ ]face ij m mR r
Face-Name Associationvertices matching between two graphs:The name affinity network and the face affinity network both can be represented as an undirected, weightedgraphs, respectively: , ,name n n nG V E W , ,face f f fG V E W
for each assignment , we can find a measurement on how well matching :
for each pair of assignments , where, , we can find a measurement on how compatible the two assignments are :
We use spectral matching method to find the final results of name-face association.
Face-Name AssociationSpectral matching method: It is commonly used for finding consistent correspondences between two sets of features.
A B
DC
1 2
43
A,B,C,DP 1,2,3,4Q
assignment , ' , where and ' Qa i i i P i
assignment
A,3 , B,1 , C,4 , D,2
, ' , , 'a i i b j j
M(a,a): It measures how well the feature i matches the feature i’.ex: M(A,3)=4, M(A,1)=1
M(a,b): It measures how well the edge (i,j) matches the edge (i’,j’).ex: M((A,3),(B,1))=4, M((A,3),(B,4))=0
Raleigh’s ratio theorem: is principal eigenvector of M corresponding to its largest eigenvalue.
Greedy algorithm:It is used for finding the solution to the correspondence problem.
A1
A2
A3
A4
B1
B2
B3
B4
C1
C2
C3
C4
D1
D2
D3
D4
A1
2 0 0 0 0 1 4 3 0 3 1 2 0 2 3 4
A2
0 3 0 0 1 0 3 1 3 0 2 4 2 0 4 2
A3
0 0 4 0 4 3 0 1 1 2 0 4 3 4 0 2
A4
0 0 0 1 3 1 1 0 2 4 4 0 4 2 2 0
B1
0 2 4 3 4 0 0 0 0 3 3 4 0 4 2 3
B2
2 0 3 1 0 3 0 0 3 0 4 2 4 0 3 3
B3
4 3 0 1 0 0 2 0 3 4 0 2 2 3 0 3
B4
3 1 1 0 0 0 0 3 4 2 2 0 3 3 3 0
C1
0 3 1 2 0 3 3 4 3 0 0 0 0 3 1 2
C2
3 0 2 4 3 0 4 2 0 2 0 0 3 0 2 4
C3
1 2 0 4 3 4 0 2 0 0 1 0 1 2 0 4
C4
2 4 4 0 4 2 2 0 0 0 0 4 2 4 4 0
D1
0 3 3 4 0 4 2 2 0 3 1 2 3 0 0 0
D2
3 0 4 2 4 0 3 3 3 0 2 4 0 4 0 0
D3
3 4 0 2 2 3 0 3 1 2 0 4 0 0 3 0
D4
4 2 2 0 2 3 3 0 2 4 4 0 0 0 0 2
1. The correspondence problem reduces now to finding the cluster C of assignments (i,i’) that maximizes the score
2. We can represent a cluster C by an indicator vector x, such that and zero otherwise.
3. the optimal solution
,
, ,a b C a C
S M a b M a a
,
, , T
a b C a C
S M a b M a a x Mx
* arg max Tx x Mx
A B
DC
1 2
43
A1 0
A2 0
A3 1
A4 0
B1 1
B2 0
B3 0
B4 0=
C1 0
C2 0
C3 0
C4 1
D1 0
D2 1
D3 0
D4 0
x
A1
A2
A3
A4
B1
B2
B3
B4
C1
C2
C3
C4
D1
D2
D3
D4
x
Outline
• Introduction• Face Clustering• Face-Name Association• Applications• Experiment• Conclusions
Applications
• Until now, we have associated a name to each speaking face track cluster. For the non-speaking face tracks, we can also classify them into the nearest speaking face track clusters depending on the EMD.
Character Relationship Mining• The relationship mining is conducted on the name affinity network.• The leading character can be considered as the one who has high
centrality in the name affinity network . The centrality of a character is defined as .
Applications
• For clique detection, we use agglomerative hierarchical clustering.
and are the numbers of the
characters in and
We classify the result cliques into dyad which has two members, triad which has three members and the large clique.
Applications
Character-Centered Browsing
Outline
• Introduction• Face Clustering• Face-Name Association• Applications• Experiment• Conclusions
Experiment
film information
speaking face track detection
Experiment
The higher the value of is , the more speaking face tracks
will be pruned.
Precision/recall curves of
face track clustering
confTh
Experiment
name-face association
relationship mining
Outline
• Introduction• Face Clustering• Face-Name Association• Applications• Experiment• Conclusions
Conclusions
• A graph matching method has been utilized to build name-face association between the name affinity network and the face affinity network.
• As an application, we have mined the relationship between characters and provided a platform for character-centered film browsing.