[DSC 2016] 系列活動:李祈均 / 人類行為大數據分析
-
date post
05-Jan-2017 -
Category
Data & Analytics
-
view
2.452 -
download
1
Transcript of [DSC 2016] 系列活動:李祈均 / 人類行為大數據分析
-
1
(Jeremy)Behavioral Informatics and Interaction Computation Lab (BIIC)
:
2017 January 15th
-
():
?
2
-
THIS
IS
SUBWAY
MAP
Data
Science
-
Nave Bayes Algorithm
Transfer learning
Apriori Algorithm
Gaussian distribute
Random Forests
Logistic Regression
(Deep)Neural Networks
Decision Trees
Nearest Neighbour
Support Vector Machine K Means Algorithm
Linear Regression
Active learning
Domain adaptation
Semi-supervised learningReinforcement learning
unsupervised learningsupervised learning
-
7
-
8
-
9
Emotion
Health Care
Education
Voice Recognition
Symptom diagnosis
Behavior Activity
Image Recogn
Medical
IBM Pathway Genomics
Detection of DiabeticRetinopathy in RetinalFundus Photographs
customer behavior
Medical Imaging
Genomic Medicine
-
What do I do ?&
What am I going to share ?
10
-
11
Behavioral signal processing
Professor Shrikanth Narayanan, USC
-
12
Seek a window into human mind and traits
through engineering approach
S. Narayanan and P. G. Georgiou, Behavioral signal processing: Deriving human behavioral informaticsfrom speech and language," Proceedings of the IEEE, vol. 101, no. 5, pp. 12031233, 2013.
-
13
Behavioral Signal Processing (BSP)
Compute Human Behavior Traits and States for Domain Experts Decision Making
Help experts to do things they know in a more efficient manner at scale
Develop novel behavioral analytics framework for possible scientific discovery
from qualitative to quantitative . . .
through verbal and non-verbal behavioral cues . . .
-
Part I
:
14
-
15
. . .
-
16
(Signals)(System)
High-level (Abstraction) . . .
-
17
-
18
-
19
:
-
20
-
21
-
22
-
23
(Self Report)
-
24
:
-
25
-
26
(Self Report)
NRS
-
27
-
28
-
29
:
-
30
Autism diagnosis observational schedule
-
31
ADOS
-
32
BSPRole . . .
:
BSP Technology
(reliability) (repeatable) (scalable)
-
QUANTITATIVEQUANTITATIVE EVIDENCE DIRECTLY FROM MEASURABLE SIGNALS
EFFICIENCY :HELP DO THINGS THAT EXPERTS KNOW TO DO WELL MORE EFFICIENTLY, CONSISTENTLY & AT SCALE
SUPPLMENTARY:
COMPLEMENT WITH GOLD STANDARD METHOD WHEN APPROPRIATE
POSSIBILITY:
TOOLS FOR NOVEL ACTIONABLE INSIGHT DISCOVERY
33
COMPUTING BEHAVIORAL TRAITS & STATES FOR DECISION MAKING & ACTION
aim..
-
34
BSPEnablers . . . ()
Text Processing
Voice Activity Detection
Alignment
Transcription
Keyword Spotting
Prosody Modeling
Voice QualityDiarization
Speaker Identification
Dialog Act Tagging
Face Detection
Expression recognition
Action recognition
LanguageUnderstandin
Affective Computing
Speaker State and Trait
Joint Speech Visual
Processing
Interaction Modeling
Sentiment Analysis
-
35
Enabling Technologies
Domain Experts Knowledge
Low level descriptors
Acoustic features
Motion features
Text features
Image features
Speech recognition
Face recognition
Action recognition
Dialog act tagging
Keyword spotting
Text processing
Sentiment Analysis
Affect recognition
Speaker states and
traits
Visual-speech
processing
Interaction modeling
Subjectiveassessment
Internal state & construct
Neuro-developmental disorder
Evidence-based
observational coding
Intervention efficacy
Coder variability
control
Development of coding manual
Self report measure validity
Coding mechanism
Social behavior
Affective behavior
Communicative
behavior
Dyadic behavior
-
36
Behavior signal processing
-
BSP INGREDIENTS
37
()
: +
I. II.
III. IV.
-
38
BSP INGREDIENTS
-
39
BSP Operational Definition
-
40
Computational Methods that Model Human Behavior Signals
Manifested in Overt and Covert Cues
Processed and Used by Humans Explicitly or Implicitly
Facilitate Human Analysis and Decision Making
Outcome of Behavioral Signal Processing
Behavioral Analytics
QUANTIFYING HUMAN EXPRESSED BEHAVIOR ANDHUMAN FELT SENSE
DERIVING INTERPRETABLE BEHAVIOR ANALYTICS FROM DATA FOR ACTIONAL INSIGHTS
-
41
-
42
(20133 13 )
(20135 29 )
-
?
43
-
44
-
45
200/
:
?
-
46
-
47
Can you tell the difference?
-
48
1. Subjective evaluation2. Time-consuming3. Non-scalable
1. 2. 3.
-
49
-
50
0
2000000
4000000
6000000
8000000
2010 2011 2012 2 0 1 3 2014 2015
2010~2015 THE NUMBER OF EMERGENCY PATIENTS
7,200,000
-
51
-
52
(Taiwan Triage and Acuity Scale, TTAS)
(NRS-11)
-
The difficulty in implementation of NRS
53
-
54
NRS-11
-
55
-
56
social-communicative neurodevelopmental disorder
Prevalence: 1 in 68 children (1 in 42 males) diagnosed [CDC2014]
ASD: Spectrum disorder due to the extreme heterogeneity
Intervention leads to improved outcomes
BSP in Autism ?
What is Autism?
-
57
ROLE OF BSP?
ADOS social and interactive
AIM?
Analysis at scale
Quantitative evidence from signals
New finding beyond current status-quo in psychiatry (?)
-
58
(
()
Qualitative description
-
59
Example: a snippet of an actual clinical ADOS diagnostic session
-
60
Can we?
Automatic measuring spontaneous social (verbal/nonverbal) behavior betweenclinician and child predicting the child rating of atypical amount of socialreciprocal communication
from qualitative to quantitative . . .
through verbal and non-verbal behavioral cues . . .
-
61
-
BSP INGREDIENTS
62
()
: +
I. II.
III. IV.
-
63
-
64
= / +
-
Part 2:
65
-
BSP INGREDIENTS
66
()
: +
I. II.
III. IV.
-
67
() (ecologically-valid)
ease-of-application, realism
established instrument Scientific-rigor Ensure domain-applicable
analytics
-
68
-
69
where
when
how
BIIC
Ensure current system is not altered too much at the BEGINNING at-scale, ease-of-application is crucial
ecological validity & quality control
BIIC
BIICKinectsynchronized
-
70!! !!
? @@
-
71
360
-
72
-
73
-
74
-
75
where
when
how
BIIC
Ensure current system is not altered too much at the BEGINNING at-scale, ease-of-application is crucial
ecological validity & quality control
BIIC
BIIC
-
76
!! !!
-
77
250
-
78
-
79
Verbal Numerical Rating Scale (NRS)
11 self-report pain-level assessment (0 - 11)
Considered as clinically-validgold standardfor assessing pain
-
80
-
81
where
when
how
BIIC
Research Oriented:We have a little more flexibility in the room design!!
ecological validity & quality control
BIICADOSADOS
ADOS
BIIC
-
82
Two HD-cameras Two lapel microphones (synced through mixers)
~40 subjects
-
83
-
Autism Diagnostic Observation Schedule [Lord 2001]
Subject interacts with a psychologist for ~45 minutes
Current gold standard, research-level observational coding
Psychologists are trained using stringent training protocol
Semi-structured assessment in eliciting socio-communicative behavior of the ASD children for diagnostics
Multiple subparts events (14) on rating of a wide range number of socio-communicative behavior (28)
84
-
85
Internally quality control
(
()
ADOS
-
86
1 2
3 4:
-
BSP INGREDIENTS
87
()
I. II.
-
88
Pre-processing Data collection-dependent Smart utilization of current
progresses in audio-video processing
label?
Label consistency Reliable labeling Construct validity
-
89
-
90
Voice Activity Detector
-
Speech signal per session
Energy every frame
frame = 25ms
standard deviation (normalize D.C. offset)
Threshold
speech percentage in the wav
Speech Segments
Energy > Threshold Energy
Short-Time energyFormula:
=
=+
()
-
Human
V A D
VAD
Human
(Part 3)
-
93
-
94
(Diarization)
-
95
diarization
Segmentation and Clustering (Diarization)
Speaker B
Speaker A
Where are speaker changes?
Which segments are from the same speaker?
-
96
Segmentation and Clustering (Diarization)
()
MFCCLow-level descriptors(part 3)
(frame)
-
97
Segmentation:speaker change detection
1. ()2. frame
Bayesian Inference Criterion(BIC)
-
98
Clusteringspeaker change detection
1. Generate i-vector for each segment2. Compute pair-wise similarity each cluster3. Merge closest clusters4. Update distances of remaining clusters to
new cluster5. Iterate steps 2-4 until stopping criterion is
met
-
SpeakerDiarization
!
-
100
68facial landmark (openface toolkit)
-
101
Face detection
68 facial landmark detection
Pre-trained Constrained local neural field method
-
102
. . .
(learn the hard way!)
-
103
TAILORED SOLUTION
1 2
3
-
104
Pre-processing Data collection-dependent Smart utilization of current
progresses in audio-video processing
label?
Label consistency Reliable labeling Construct validity
-
105
Label
-
dynamic range
-
4 dimensions: 95% variance
( 20% )
( 20% )
( 20% )
( 10% )
( 10% )
( 10% )
( 10% )
(100%)
107
label -
PCA
First principal axis weights
-
inter-evaluator agreement level
concept!
rank-normalized
Depends on the scenarios (sometimes reviewers too!)
Cronbachs alpha, Intra-class correlation, Fleiss Kappa, Cohans Kappa
++
0.550.390.430.58
0.63
-
109
Label
-
110
Self report
:
:
?
-
111
frameworksample?
Rule:
Data samples
IEEE
-
112
Label
-
113
? Label Social Reciprocity ADOS
Description of pictureCreating a story
Emotion Joint interactive play
label
-
114
Pre-processing Data collection-dependent Smart utilization of current
progresses in audio-video processing
label?
Label consistency Reliable labeling Construct validity
1label 2domain experts
3
-
115
Enabling Technologies
Domain Experts Knowledge
Low level descriptors
Acoustic features
Motion features
Text features
Image features
Speech recognition
Face recognition
Action recognition
Voice activity
Diarization
Text processing
Sentiment Analysis
Affect recognition
Speaker states and
traits
Visual-speech
processing
Interaction modeling
Subjectiveassessment
Internal state & construct
Neuro-developmental disorder
Evidence-based
observational coding
Intervention efficacy
Coder variability
control
Development of coding manual
Self report measure validity
Coding mechanism
Social behavior
Affective behavior
Communicative
behavior
Dyadic behavior
label
Label
-
116
1. data2. label/data3. behavior analytics
-
Part 3:
117
-
BSP INGREDIENTS
118
()
: +
I. II.
III. IV.
-
119
Enabling Technologies
Domain Experts Knowledge
Low level descriptors
Acoustic features
Motion features
Text features
Image features
Speech recognition
Face recognition
Action recognition
Dialog act tagging
Keyword spotting
Text processing
Sentiment Analysis
Affect recognition
Speaker states and
traits
Visual-speech
processing
Interaction modeling
Subjectiveassessment
Internal state & construct
Neuro-developmental disorder
Evidence-based
observational coding
Intervention efficacy
Coder variability
control
Development of coding manual
Self report measure validity
Coding mechanism
Social behavior
Affective behavior
Communicative
behavior
Dyadic behavior
-
120
human computing (signal) research
Data & algorithm go hand-in-hand
Algorithms
-
121
?
-
122
/Profile
/Profile
/Profile
Behavioral Analytics
-
123
(low-level descriptors)
-
124
/Profile
(frame)Overlapping step
Source Filter
-
125
LLDs
Pitch (source):
Intensity (pressure):
MFCC (filter):
=
=+1
2()
=
=0
1
+ + + , 0
k
MFCC(13)
-
126
Versatile and Fast Audio Feature ExtractorOpen-Source and Cross-platformAbundant speech-related features
Signal energy LoudnessMel-spectraMFCCPLP-CCPitch
Audio I/OSupported A lot I/O formats: WEKA HTK LibSVM
PraatOpensmile
. . .
-
127
/Profile
Histogram of oriented gradients (HoG)Scale-invariant feature transform (Sift) Local binary pattern (Lbp)3D SIFTHOG3D
textureshapekeypointedge
() frame
Histogram of oriented gradients (HoG) Local binary pattern (Lbp)
-
128
C++ : opencv
Python : cv2(Opencv), Scikit-image
-
129
trajectory
Per-frame ?
Improved Dense Trajectory
Optical flow
Trajectory + HOG + HOF + MBH
-
130
data
-
131
(encoding/profile)
10ms
66ms
Analysis unit session
Label (time granularity)analysis unit
analysis unit
-
132
Analysis unit
Analysis unit
-
133
Functionals
LLDs
- featureanalysis unit
speaker state, emotion recognitionbaseline!!
# #=
-
134
k-means clustering
Histograms
Dictionary
Bag-of-feature encoding
LLDs
k-means
clustering
audio, video features
=
-
135
Analysis unit
Analysis unit
-
136
/Profile
(:analysis unit)
Distributed word representation
-
137
Term Weighting Method
a simplifying representation by term count
Term FrequencyHow important (or
informative) a word in a document.
Inverse Document FrequencyHow important (or
informative) a word in the corpus.
,
=, ,
,
= log
1 + X
Term FrequencyInverse Document Frequency (TF-IDF)
-
138
. . .
N-gram Turn unigram term into bigram term on the word token stepfor instance,
John also likes to watch football games
[ 'John also' , 'also likes' , 'likes to' , 'to watch' , 'watch football' , 'football games' ]
[ 1 , 1 , 1 , 1 , 1 , 1 ]
-
139
Distributed word representation()
CBOW predicting the word given its context
Skip-gram predicting the context given a word
distributed representation encoded in the hidden layer of the neural network as representations of words
-
140
-
141
(low-level descriptors)
-
142
(multimodal)work
-
143
/Profile
/Profile
/Profile
Behavioral Analytics
Behavioral Analytics
Behavioral Analytics
? ?
-
144
/Profile
/Profile
/Profile
Behavioral Analytics
Note*
(D/R)NN, (B)LSTM
BSP Work , just be aware of f(# of data), and sometimes
-
145
. . .
BSP
!!
-
146
. . .
-
147
-
148
:
:
-frame Dense Points Tracking
TRAJ
MBHxy
Each = A Unit-level (66ms) -length Derived Video features
: Dense Trajectory Fisher-
1
2
3
1
2
Acoustic LLDs
Each : = A Unit-level (200ms)-length Dense Acoustic Features
Functionals
1: {1, 1}1
1:1
2:1
:1
1:
: Dense Unit Acoustic Features
2: {1, 2}
3: {1, 3}
4: {1, 4}
K-Means Bag-of-word
-
149
-
|c |n |v |r |c |p|vn |r |v |p |n|r |d |v |v |v |r |ng|uj |m |n|zg |v |r |n |uj |n |zg |v |r |n |uj |n|n |l |p |r |b |uj|n |v ,|uj |m |v
Jieba
Built to be the best Python Chinese word segmentation module
-
151
Word2Vec
Yahoo newswikiptt
-
152
...
N-gram K-meansAll Documents
BOWper Document
Word2vec
N
functional, context, bow
-
153
/Profile
/Profile
/Profile
Behavioral Analytics
-
154
analytics?
= .
Inter-evaluator agreement 0.63
. . . (part 4)
Spearman correlation
0.3 - 0.4
-
155
-
Raw audio-videorecording
S1
S2
Sk
. . . MFCCPitch
Intensity
1 : [1,1]
2 : [1, 2]
: [1,]
156
:
:
S1
-
157
Action-unit inspired facial low-level descriptors computation
Facial landmark Head pose estimation
X
Z
Y
Head orientation movement
-
158
/Profile
/Profile
Behavioral Analytics
-
159
NRS- : :
-
160
NRS- : :
-
161
? self-report NRS111!
74%
52%
. . . (part 4)
audio video>
-
162
-
163
:
(
Quantitatively, Automatically
ADOS description
-
164
ADOSEmotion Part
Multimodal Turn-taking Behavior
Coordination Time Series
Automatic generating a time-series ofmultimodal behavior coordination measureacross a session . . .
-
165
Audio
Pitch
Intensity
MFCC
Delta Delta-Delta
Video
Head poses
Eye gaze
Delta Delta-Delta
-
166
/Profile
/Profile
Canonical correlation analysis
-
167
ADOSEmotion Part
Multimodal Turn-taking Behavior
Coordination Time Series
Automatic generating a time-series ofmultimodal behavior coordination measureacross a session . . .
-
168
(symbol)
turn-taking:(1.5second)Sliding
-
169
1.5s
X:
Y:
3 3 3 2 1 1 2 1 3
2 1 2 1 3 1 1 2 3
Shift
Session-level descriptors
Behavioral Analytics
n turn, n
Logistic regression
(dependency)
-
170
Binary Classification between typical vs. atypical
-
ADOS: Social reciprocity score (B9)
-
ADOS: social reciprocity score (B9)
-
173
= .
= .
= .
-
data science work
analytics
? (part 4)
174
-
175
/Profile
/Profile
/Profile
Behavioral Analytics
Behavioral Analytics
Behavioral Analytics
-
176
concept
General end-to-end system needs more R&D
Context-dependent (what ever works)
good rule of thumb
mapconstruct
-
177
/Profile
/Profile
/Profile
Behavioral Analytics
Note*
f(# of data), and
-
BSP INGREDIENTS
178
()
: +
I. II.
III. IV.
-
Part 4:
:
179
-
BSP INGREDIENTS
180
()
: +
I. II.
III. IV.
-
181
= .
= .
= .
-
182
-
183
= .
?
??
-
184
:
2
1
X
2
1
10
= .
= .
= .
-
185
= .
Higher consistency
?
-
Extension
186
Good collaborative vibe . . .
!
-
187
?
-
188
multi-task learning
()task
Task 1 - feature
Task 2 - feature
Task 8 - feature
.
.
.
Kernel
Multi-task learning
-
189
?
!
An actionable insights that were not clear beforeHence, project continue
-
190
= .
-
191(: 0-3, : 4-6, : 7-10)
: :
: :
74%
-
192
Content Validity
Validity
Construct Validity
Criterion Validity
-
193
acute painelderly
self-report
complementgold standard
()
NRS-11
A-V + FEATURE 43%
70%
Project continue
-
194
= .
?
-
195
POINT TO HIGHER ATYPICALITY
-
196
BSP
-
197
Psychologists unconsciously alter communicative social behavior strategy (cueingbehavior?) as conditioned on ASD kids ability to carry out reciprocal communicationduring interaction
-
198
()
: 0.81
-
199
Insight beyond current capability, opportunity now emerges
We can now start imagining the application of this :
(1) (?)
(2) ?
More?
-
200
Descriptors Included
Child Prosody Psych Prosody Child and Psych Prosody
Spearmans 0.64*** 0.79*** 0.67***
Psychologists acoustics at least as predictive of child ASD severity ratings
ADOS!
[1] Daniel Bone, Chi-Chun Lee, Matthew P. Black, Marian E. Williams, Pat Levitt, Sungbok Lee, and Shrikanth Narayanan, "The Psychologist as an Interlocutor in Autism Spectrum Disorder Assessment: Insights from a Study of Spontaneous Prosody", Journal of Speech, Language, and Hearing Research, 2014, 57(4), 1162-1177.
Hard to obtained scientific insights without such behavioral analytics for domain experts
NEED MORE VERIFICATION
-
201
:
1. Data
-
Is it Technical? Example Pitfall 1
Controlling for Channel Factors
Interspeech 2013 Autism Challenge
Baseline Approach
Black-box (works well)
2-class baseline: 92.8% UAR (chance is 50% UAR)
Hypothesis: Model captures channel, not diagnosis
ASD/SLI from 2 clinics, TD from classrooms
Simple experiment showed channel differences
Matched baseline
Conclusion: Remit (or note) noise sources in data collection.
202
Daniel Bone, Theodora Chaspari, Kartik Audkhasi, James Gibson, Andreas Tsiartas, Maarten Van Segbroeck, Ming Li, Sungbok Lee, and ShrikanthNarayanan, "Classifying Language-Related Developmental Disorders from Speech Cues: the Promise and the Potential Confounds", InterSpeech, 2013.
11/11/2014
-
203
:
2. cross validation
-
Is it Technical: Example Pitfall 2
Behavior Analysis & Modeling: Cross-validationThey do not perform speaker-separated cross-fold
validation! Can we detect United States Senators party affiliations
from speech features (with black-box approach)?
Performance increases as # samples/speaker increases
Conclusion: Always perform speaker-separated cross-validation!
20411/11/2014
-
205
-
206
Affective Computing
Social Signal Processing
Paralinguistic Recognition
Physiological/Pathological Disorder Recognition/Prediction
BSP,
-
207
In-car
In-home
In-classroom
on and on
-
208
application domain
-
209
Motivation Interview: Addiction Therapy
-
210
By professor Shrikanth Narayanan
System in clinical trial
-
211
?
-
212
Behavioral Signal Processing (BSP)
Compute Human Behavior Traits and States for Domain Experts Decision Making
Help experts to do things they know in a more efficient manner at scale
Develop novel behavioral analytics framework for possible scientific discovery
from qualitative to quantitative . . .
through verbal and non-verbal behavioral cues . . .
Transformative effort . . .
-
213
OF
FOR
BY
COMPUTING
HUMANS
Human action and behavior data
Meaningful analysis, timely decision making & intervention (action)
Collaborative integration of human expertise with automated processing
By professor Shrikanth Narayanan
-
214
Enabling Technologies
Domain Experts Knowledge
Low level descriptors
Acoustic features
Motion features
Text features
Image features
Speech recognition
Face recognition
Action recognition
Dialog act tagging
Keyword spotting
Text processing
Sentiment Analysis
Affect recognition
Speaker states and
traits
Visual-speech
processing
Interaction modeling
Subjectiveassessment
Internal state & construct
Neuro-developmental disorder
Evidence-based
observational coding
Intervention efficacy
Coder variability
control
Development of coding manual
Self report measure validity
Coding mechanism
Social behavior
Affective behavior
Communicative
behavior
Dyadic behavior
Relative New:RICH R&D
OPPORTUNITIES(CHALLENGES)
-
215
BSP INGREDIENTS
-
216
-
217
()
()
(): Pattern ()
Contextualize
-
218
I was challenged and inspired
-
219
-
220
-
221
:
Challenging the status quo/ Pushing scientific boundaryMaking a positive impact
-
222
BiiC lab @ NTHU EEhttp://biic.ee.nthu.edu.tw
THANK YOU . . .
many COLLABORATORS + the entire BIIC lab