When the Distribution Is the AnswerVizWiz Challenge
Sandro Pezzelle
Contacts:[email protected]
skype: sandro.pezzellemobile: +39 349 0537325
sandropezzelle.github.ioresearchgate
linkedInscholar
arXiv
Work address:CIMeC, University of Trento
Corso Bettini, 3138068 Rovereto (TN), Italy
Skills
Languages
� Italian� English� FrenchProgramming
� Unix� Python� Keras� Tensorflow
� Matlab� Psychtoolbox
� Lua/TorchStatistics & Others
� R/RStudio� lme4� ggplot2
� LaTeX� LibreOffice� Inkscape� HtmlSoft Skills
� Communication� Writing� Organization� Learning� Networking
Sandro PezzellePhD Student
About me PhD Student in Cognitive and Brain Sciences, track Language,Interaction and Computation. My current research - at the intersectionbetween Computational Linguistics, Computer Vision and Cognition - isfocused on the learning of quantity expressions (numbers, proportions,quantifiers). I’d define myself as an enthusiastic, communicative, multi-faceted person. Proactive and inclined to lifelong learning. “Let’s try!” asa personal motto. My code is full of print().
Education2015 - present, PhD in Cognitive and Brain SciencesCIMeC, University of Trento, Italy. Supervisor: Raffaella BernardiComputational Linguistics, Computer Vision, Cognitive Sciences, MachineLearning, AI
2012 - 2015, MSc in Linguistics, 110/110 cum laudeUniversity of Padova, Italy. Supervisors: Laura Vanelli, Marco MarelliDistributional Semantics, Psycholinguistics, Morphology
Jan 2014 - Jul 2014, Erasmus ProgramUniversite Catholique de Louvain, Belgium.Applied Linguistics, Computational Linguistics, Statistics
2009 - 2012, BSc in Modern Literature, 110/110 cum laudeUniversity of Padova, Italy. Supervisor: Luca ZulianiStylistic and Metrics, Formal Linguistics, Philology
Relevant ExperienceOct 2017, Research InternILLC, University of Amsterdam. Supervisor: Jakub SzymanikDistributional Semantics, Formal Linguistics, Language Modelling
Nov 2016 - Jun 2017, Language SpecialistAppen. Part-time, project-oriented remote positionComputational Linguistics, Formal Linguistics
Training2017 Mini-Symposium on Deep Generative Models, Amsterdam2017 iV&L Training School on Cognitive Robotics, Athens2016 26th ESSLLI, Bolzano2016 iV&L Training School on Deep Learning, Malta2015 - 2016 Machine Learning by Stanford University, Coursera
Recent PresentationsOct 24, 2017 Learning to Quantify from Language and Vision: Insights from Be-havioral and Computational Studies. Talk at Comp. Ling. Series, Amsterdam.Sep 28, 2017 Quantifiers and Proportions in Language and Vision: Insights fromBehavioral and Computational Studies. Talk at CoSaQ Workshop, Amsterdam.Sep 26, 2017 Be Precise or Fuzzy: Learning the Meaning of Cardinals and Quan-tifiers from Vision. Poster at Google NLP Summit, Zurich.
Denis Dushi Sandro Pezzelle Tassilo Klein Moin Nabi
2INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
VQA Task
Q: “What is this?”
AnnotationsInput
answer count
bottle 5
tv 2
office 2
room 1
A1 bottleA2 bottleA3 tvA4 officeA5 bottleA6 tvA7 bottleA8 roomA9 officeA10 bottle
Ground Truth
“bottle”
3INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
VQA Evaluation metric
answer count
bottle 5
tv 2
office 2
room 1
Ground Truth
“bottle”
accuracy = min(# Annotators providing that answer
3
, 1) (1)
L(x, c,w) =
|c|X
i=1
wi
(� log
exci
P|x|j=1 e
xj
) (2)
Table 1:
num answers/classes 1 2 5 50 300 3000 40271
soft-loss model acc. (val) 0.349 0.402 0.424 0.481 0.504 0.516 0.512
Table 2: Accuracy of soft-loss model using N classes in prediction.
1
Annotations Evaluation Accuracy
prediction accuracy
bottle 100%
tv ~ 67%
office ~ 67%
room ~ 33%
Training Loss
[1] Antol et al. (2015). VQA: Visual Question Answering. Proceedings of the IEEE international 076 conference on Computer Vision: 2425–2433
[1]
4INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Subjectivity
[2] Jolly, Pezzelle et al. (2018). The Wisdom of MaSSeS: Majority, Subjectivity, and Semantic Similarity in the Evaluation of VQA
5INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Coverage analysis
num answers/classes 1 2 5 50 300 3000 40271
num samples (train) 9541 11570 12531 14963 17046 19425 20K
% samples (train) 47.70 57.85 62.65 74.81 85.23 97.12 100
Table 1: Number and percentage of samples covered by using the top-N answers
(row 1).
1
• Coverage of samples considering all the annotations
6INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Most frequent answer : unanswerable
count covered samples % covered samples1 3059 32%2 1878 20%≥ 3 4604 48%
7INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Uncertainty-aware training
• Methods that use only the most-frequent answer ignore :
Uncertainty-aware training Uncertainty modeled as agreement over humans
1. Contribution of other answers
2. Uncertainty of each answer
8INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Soft cross-entropy loss
«What's the weather like outside on this photo? Thank you»
.
.
.
7 cloudy0 unsuitable0 yes2 overcast0 blue0 dog...
●
10
VQA Model
.
.
.
7 cloudy0 unsuitable0 yes2 overcast0 blue0 dog...
●
10
accuracy = min(# Annotators providing that answer
3
, 1) (1)
L(x, c,w) =
|c|X
i=1
wi
(� log
exci
P|x|j=1 e
xj
) (2)
Table 1:
num answers/classes 1 2 5 50 300 3000 40271
soft-loss model acc. (val) 0.349 0.402 0.424 0.481 0.504 0.516 0.512
Table 2: Accuracy of soft-loss model using N classes in prediction.
1
[3] Ilievski et al. (2017). A simple loss function for improving the convergence and accuracy of visual question answering models.
[4] Kazemi et al. (2017). Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering.
[3]
• Standard VQA model [4]
9INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
ResultsDataset Augmented VizWiz 50% VizWiz Balanced VizWiz
Accuracy 0.501 0.446 0.111
num answers/classes 1 2 5 50 300 3000 40271
soft-loss model acc. (val) 0.349 0.402 0.424 0.481 0.504 0.516 0.512
Table 2 Accuracy of soft-loss model using N classes in prediction.
Actual class (Most freq. answer)
other unanswerable / unsuitable
predicted class
other 1199 118
unanswerable / unsuitable 1052 804
Table 3 Confusion matrix. unanswerable and unsuitable are the answers with
the highest coverage of samples in VizWiz.
manipulation augmented train 50% train balanced val
accuracy 0.501 0.446 0.111
only Text only Vision Multimodal
unanswerable 0.784 0.796 0.803
other 0.138 0.299 0.340
yes/no 0.499 0.346 0.690
number 0.243 0.319 0.285
tot. accuracy 0.377 0.476 0.516
Table 4 Ablation study.
2
• Accuracy on validation split
• Accuracy on test-challenge split
method acc
SoA 0.475
Ours 0.512
[5] Gurari et al. (2018). VizWiz Grand Challenge: Answering Visual Questions from Blind People.
[5]
10INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Preprocessing
• Accuracy on test-challenge
method acc
SoA 0.4750
Ours 0.5120
Ours + prepro 0.5163
1. Smartly stripping punctuation
2. Filtering conversational words
e.g. “can’t” à “cant”
e.g. “hello”, “please”, “thank you”, “goodbye” ...
[5] Gurari et al. (2018). VizWiz Grand Challenge: Answering Visual Questions from Blind People.
[5]
11INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Answerability task
• Accuracy on test-dev
method F1 AP
Ours 65.02 74.71
Ours + Up 68.84 74.73
1. Change output layer of multi-class model
2. Balance dataset
Label : 0/1 (unanswerable/answerable)
• Up-sampling
• Down-samplingImbalanced dataset (71.3 % answerable)
• Accuracy on test-challenge
method F1 AP
SoA - 71.7
Ours + Up 67.71 73.11
[5] Gurari et al. (2018). VizWiz Grand Challenge: Answering Visual Questions from Blind People.
[5]
12INTERNAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Conclusion
1. Multi-class task
2. Answerability task
Binary classifier with up-sampling of unanswerable samples
• Soft cross-entropy
• Smart preprocessing
Sandro Pezzelle
Contacts:[email protected]
skype: sandro.pezzellemobile: +39 349 0537325
sandropezzelle.github.ioresearchgate
linkedInscholar
arXiv
Work address:CIMeC, University of Trento
Corso Bettini, 3138068 Rovereto (TN), Italy
Skills
Languages
� Italian� English� FrenchProgramming
� Unix� Python� Keras� Tensorflow
� Matlab� Psychtoolbox
� Lua/TorchStatistics & Others
� R/RStudio� lme4� ggplot2
� LaTeX� LibreOffice� Inkscape� HtmlSoft Skills
� Communication� Writing� Organization� Learning� Networking
Sandro PezzellePhD Student
About me PhD Student in Cognitive and Brain Sciences, track Language,Interaction and Computation. My current research - at the intersectionbetween Computational Linguistics, Computer Vision and Cognition - isfocused on the learning of quantity expressions (numbers, proportions,quantifiers). I’d define myself as an enthusiastic, communicative, multi-faceted person. Proactive and inclined to lifelong learning. “Let’s try!” asa personal motto. My code is full of print().
Education2015 - present, PhD in Cognitive and Brain SciencesCIMeC, University of Trento, Italy. Supervisor: Raffaella BernardiComputational Linguistics, Computer Vision, Cognitive Sciences, MachineLearning, AI
2012 - 2015, MSc in Linguistics, 110/110 cum laudeUniversity of Padova, Italy. Supervisors: Laura Vanelli, Marco MarelliDistributional Semantics, Psycholinguistics, Morphology
Jan 2014 - Jul 2014, Erasmus ProgramUniversite Catholique de Louvain, Belgium.Applied Linguistics, Computational Linguistics, Statistics
2009 - 2012, BSc in Modern Literature, 110/110 cum laudeUniversity of Padova, Italy. Supervisor: Luca ZulianiStylistic and Metrics, Formal Linguistics, Philology
Relevant ExperienceOct 2017, Research InternILLC, University of Amsterdam. Supervisor: Jakub SzymanikDistributional Semantics, Formal Linguistics, Language Modelling
Nov 2016 - Jun 2017, Language SpecialistAppen. Part-time, project-oriented remote positionComputational Linguistics, Formal Linguistics
Training2017 Mini-Symposium on Deep Generative Models, Amsterdam2017 iV&L Training School on Cognitive Robotics, Athens2016 26th ESSLLI, Bolzano2016 iV&L Training School on Deep Learning, Malta2015 - 2016 Machine Learning by Stanford University, Coursera
Recent PresentationsOct 24, 2017 Learning to Quantify from Language and Vision: Insights from Be-havioral and Computational Studies. Talk at Comp. Ling. Series, Amsterdam.Sep 28, 2017 Quantifiers and Proportions in Language and Vision: Insights fromBehavioral and Computational Studies. Talk at CoSaQ Workshop, Amsterdam.Sep 26, 2017 Be Precise or Fuzzy: Learning the Meaning of Cardinals and Quan-tifiers from Vision. Poster at Google NLP Summit, Zurich.
Denis Dushi Sandro Pezzelle Tassilo Klein Moin Nabi
Thank you.(Answerable) Questions?
Top Related