Where, Who and What? @AIT Intelligent Affective Interaction ICANN, Sept. 14, Athens, Greece...
-
Upload
buck-barry-watts -
Category
Documents
-
view
213 -
download
0
Transcript of Where, Who and What? @AIT Intelligent Affective Interaction ICANN, Sept. 14, Athens, Greece...
Where, Who and What?
@AIT
Intelligent Affective InteractionICANN, Sept. 14, Athens, Greece
Aristodemos Pnevmatikakis, John Soldatos and Fotios TalantzisAthens Information Technology, Autonomic & Grid Computing
Overview
• CHIL– AIT SmartLab
• Signal Processing for perceptual components– Video Processing– Audio Processing
• Services
• Middleware– Easing application assembly
Computers in the Human Interaction Loop• EU FP6 Integrated Project (IP
506909) • Coordinators: Universität
Karlsruhe (TH) Fraunhofer Institute IITB
• Duration: 36 months• Total Project costs: Over 24M€• Goal: Create environments in
which computers serve humans who focus on interacting with other humans as opposed to having to attend to and being preoccupied with the machines themselves
• Key Research Areas:– Perceptual Technologies– Software Infrastructure– Human-Centric Pervasive
Services
AIT SmartLab Equipment• Five fixed cameras (one with fish-eye lens)• PTZ camera• NIST 64-channel array• 4 clusters of 4 inverted T-shaped SHURE
microphone clusters• 4 tabletop microphones• 6 dual Xeon 3 GHz, 2 Gb PCs• Firewire cables & repeaters
AIT SmartLab
Perceptual Components
Detection and Identification System
Recognizer
Detector
Eye detector
Head detector
TrackerFace
normalizerFace
recognizer
Frontal verifier
Confidence estimator
Weighted voting
Classifier confidence
ID
Frontality confidence
Unconstrained Video Difficulties
Where and Who are the World Cup Finalists?
• and European Champions?
Tracking
Adaptive background
Parameters’ adaptation
Adaptive Background Module
Frames
Target association
Evidence Generation Module
Track initialization
Targetsplit?
Kalman Module
State
Prediction
Measurement update
Edge detection
Evidence extraction
SplitExisting
New
Predicted tracks
PPM
State information
No split
New state
Edges
Track consistency
Track memory
Track Consistency ModuleTargets
Tracking – Smart Spaces
Tracking – 3D from Synchronized Cameras
Tracking – Outdoors Surveillance
• AIT system 2nd in the VACE / NIST surveillance evaluations
Head DetectionEye
detectorHead
detectorTracker
Face normalizer
Face recognizer
Frontal verifier
Confidence estimator
Weighted voting
• Detection of head by processing the outline of the foreground belonging to the body
Eye DetectionEye
detectorHead
detectorTracker
Face normalizer
Face recognizer
Frontal verifier
Confidence estimator
Weighted voting
• Vector quantization of colors in head region
• Detect candidate eye regions– Based on resemblance to skin, brightness, shape and size
• Selection amongst candidates based on face geometry
Face Recognition from Video
Effect of Eye Misalignment: LDA
2 3 4 5 6 7 8 9 100
5
10
15
20
25
30
35
Numbero of training images per person
PM
C (
%)
Ideal eyes
Ideal for training, detected for testingDetected for training, testing
Effect of Eye Misalignment
0 1 2 3 4 5 6 70
10
20
30
40
50
60
RMS eye perturbation (%, relative to eye distance)
PM
C (%
)PCA
PCAw/o3
LDA
EBGM
Laplacianfaces
MACE
2D-HMM
Edginess No preprocessing Feature vector Post-decision
5
10
15
20
25
30
PM
C (
%)
Classifier FusionIllumination variations Pose variations
• Classifier fusion addresses the fact that different classifiers are optimum for different recognition impairments
Edginess No preprocessing Feature vector Post-decision0
10
20
30
40
50
60
70
80
PM
C (
%)
Fusion Across Time, Classifiers and Modalities
Speech of an individual collected
over 5 seconds
Faces of an individual collected
over 5 secondsH
isto
gram
eq
ualiz
atio
n
PCA LDA
Fus
ion
acro
ss ti
me
Fus
ion
acro
ss ti
me
N imagesN images N imagesN IDs and
confidences, PMC of 60%
N IDs and confidences, PMC of 58%
Fusion across classifiers
Single ID and confidence, PMC of 31%
Single ID and confidence, PMC of 36%
Visual ID and confidence, PMC of 29%
Fusion across modalities
Audio ID and confidence,
PMC of 9.7%
Audio-Visual ID, PMC of 6.8%
Face Recognition @ CLEAR2006
15 sec training 30 sec training
Testing duration
(sec)1 5 10 20 1 5 10 20
AIT 50.57 29.68 23.18 20.22 47.31 31.14 26.64 24.72
UKA 46.82 33.58 28.03 23.03 40.13 23.11 20.42 16.29
UPC 79.77 78.59 77.51 76.40 80.42 77.13 74.39 73.03
New AIT
45.35 27.01 17.65 15.73 43.72 17.76 13.49 7.86
Speaker ID @ CLEAR2006
15 sec training 30 sec training
Testing duration
(sec)1 5 10 20 1 5 10 20
AIT 26.92 9.73 7.96 4.49 15.17 2.68 1.73 0.56
CMU 23.65 7.79 7.27 3.93 14.36 2.19 1.38 0.00
LIMSI 51.71 10.95 6.57 3.37 38.83 5.84 2.08 0.00
UPC 24.96 10.71 10.73 11.80 15.99 2.92 3.81 2.81
AIT IS2006
25.69 5.60 4.50 2.25 15.01 2.19 2.42 0.0
Audiovisual ID @ CLEAR200615 sec training 30 sec training
Testing duration
(sec)1 5 10 20 1 5 10 20
AIT 23.65 6.81 6.57 2.81 13.70 2.19 1.73 0.56
UIUC primary
17.61 2.68 1.73 0.56 13.21 2.43 1.38 0.56
UIUC contrast
20.55 5.60 3.81 2.25 15.99 3.41 2.42 1.12
UKA / CMU
43.07 29.20 23.88 20.22 35.73 19.71 16.61 12.36
UPC 23.16 8.03 5.88 3.93 13.38 2.92 2.08 1.12
Audiovisual Tracker• Information-theoretic
speaker localization from mic. array– Accurate azimuth,
approximate depth, no elevation
• Moderate targeting of speaker’s face using a PTZ camera
• Refine targeting by visual face detection
Services
Memory Jog• Memory Jog:
– Context-Aware Human-Centric Assistant for meetings, lectures, presentations
– Proactive, Reactive Assistance and Information Retrieval
• Features-Functionalities– Sophisticated Situation Modeling / Tracking– Essentially Non-obtrusive Operation– Intelligent Meeting Recording Functionality– GUI runs also on PDA– Full Compliance to CHIL Architecture– Integration actuating devices (Targeted Audio,
Projectors)
Context as Network of SituationsTransition Elements & Components
NIL S1 Table Watcher (people in table area), SAD
S1 S2White-Board Watcher (presenter in speaker area),
Face ID, Speaker ID
S2 S3Speaker ID (speaker ID ≠ presenter ID), Speaker
Tracking
S3 S2Face Detection (presenter in speaker area),
Face ID, Speaker ID
S2 S4White-Board Watcher (no face in speaker area for N seconds), Table Watcher (all participants in meeting
table)
S4 S5 Table Watcher (nobody in table area)
What Happened While I was Away?
Middleware
Virtualized Sensor Access
CHIL Compliant Perceptual Components
• Several sites develop site, room, configuration specific Perceptual Components for CHIL
• Provide common abstractions in the input and output of the PC (black box)
• Facilitate Component Exchange Across Sites & Vendors
• Standardization commenced for Body Trackers– Continues to Face ID Components
Architecture for Body Tracker Exchange
Information retrieval
Transparent connection to sensor output
Common control API (CHILiX)
Services complying to current API
Non-CHIL Compliant Body Tracker
Sensor abstraction
Thank you!Questions?