Matthias Wimmer, Ursula Zucker and Bernd Radig Chair for Image Understanding Computer Science...

11
Matthias Wimmer, Ursula Zucker and Bernd Radig Chair for Image Understanding Computer Science Technische Universität München { wimmerm, zucker, radig }@in.tum.de Human Capabilities on Video- based Facial Expression Recognition

Transcript of Matthias Wimmer, Ursula Zucker and Bernd Radig Chair for Image Understanding Computer Science...

Matthias Wimmer, Ursula Zucker and Bernd Radig

Chair for Image UnderstandingComputer Science

Technische Universität München

{ wimmerm, zucker, radig }@in.tum.de

Human Capabilities on Video-based Facial Expression Recognition

2007-09-10 2/10Technische Universität MünchenUrsula Zucker

Motivation Facial Expression Recognition

goal: human-like man-machine communication six universal facial expressions [Ekman]:

anger, disgust, fear, happiness, sadness, surprise minimal muscle activity

-> reliable recognition is difficult recognition rate of state-of-the-art approaches: ~ 70%

Question How reliable do humans specify facial expressions?

-> survey to determine human capabilities

2007-09-10 3/10Technische Universität MünchenUrsula Zucker

The Facial Expression Database

Cohn-Kanade AU-Coded Facial Expression Database 488 image sequences (containing 4 up to 66 images) each showing one of the six universal facial expressions no natural facial expressions (simulated ground truth) no context information

2007-09-10 4/10Technische Universität MünchenUrsula Zucker

Description of Our Survey Execution of the Survey

participants are shown randomly selected sequences 250 participants 5413 annotations -> approx. 11 per sequence

2007-09-10 5/10Technische Universität MünchenUrsula Zucker

Evaluation Evaluation of the Survey

no ground truth -> comparison of the annotations to one another

annotation rate for each sequence and each facial expression

relative agreement for an expression confusion between facial expressions

Comparison to algorithms recognition rate

2007-09-10 6/10Technische Universität MünchenUrsula Zucker

Annotation Rate for Each Sequence

Explanation: 488 rows 1 row = 1 sequence darker regions denote a

higher annotation rate sorted by similar annotation

Result: happiness:

best annotation rates surprise and fear:

get confused often fear: difficult to tell apart

2007-09-10 7/10Technische Universität MünchenUrsula Zucker

Relative Agreement

Explanation: example: annotating the sequences as happiness

~ 350 sequences annotated as happiness by nobody, ~ 50 sequences annotated as happiness by everybody

well-recognized facial expressions have peaks at “0” and at “1”

2007-09-10 8/10Technische Universität MünchenUrsula Zucker

Confusion Between Facial Expressions

)(

)(),(

21

2121

H

H

fear and surprise: high confusion happiness and disgust: low confusion

confusion rate

anger disgust fear happiness sadness surprise

anger 100% 42% 24% 7% 43% 29%disgust 42% 100% 33% 6% 19% 25%fear 24% 33% 100% 11% 16% 44%happiness 7% 6% 11% 100% 7% 14%sadness 43% 19% 16% 7% 100% 29%surprise 29% 25% 44% 14% 29% 100%

),( 21

2007-09-10 9/10Technische Universität MünchenUrsula Zucker

Comparison: humans vs. algorithms

ground truth: provided by Michel et. al.

Results: Michel et. al.: worse at recognizing anger Schweiger et. al.: worse at recognizing disgust, fear,

happiness and on the average

facial human specification result of the algorithm result of the algorithmexpression during our survey of Michel et. al. of Schweiger et. al.

anger 72% 67% 76%disgust 64% 64% 30%fear 28% 67% 0%happiness 91% 92% 79%sadness 53% 63% 61%surprise 77% 83% 90%

average 64% 72% 56%

2007-09-10 10/10Technische Universität MünchenUrsula Zucker

Conclusion Survey applies similar assumptions as algorithms:

consideration of visual information only no context information no natural facial expressions

Summary of our results: poor recognition rate of humans – worse than expected some facial expressions get confused easily

Conclusion & Outlook: integration of more sources of information is highly

recommended, e. g. audio/language, context, ...

2007-09-10 11/10Technische Universität MünchenUrsula Zucker

Thank you