[PR12] Capsule Networks - Jaejun Yoo
-
Upload
jaejun-yoo -
Category
Science
-
view
610 -
download
2
Transcript of [PR12] Capsule Networks - Jaejun Yoo
Capsule Networks
PR12와 함께 이해하는
Jaejun YooPh.D. Candidate @KAIST
PR12
17th Dec, 2017
Today’s contents
Dynamic Routing Between Capsules
by Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton
Oct. 2017: https://arxiv.org/abs/1710.09829
NIPS 2017 Paper
Convolutional Neural Networks
What is the problem with CNNs?
Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc
1) If the images have rotation, tilt or any other different orientation then CNNs have poor performance.2) In CNN each layer understands an image at a much more granular level (slow increase in receptive field).
DATA AUGMENTATION,MAX POOLING
Convolutional Neural Networks
What is the problem with CNNs?
Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc
“Pooling helps in creating the positional invariance. Otherwise This invariance also triggers false positive for images which have the components of a ship but not in the correct order.”
Convolutional Neural Networks
What is the problem with CNNs?
Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc
“Pooling helps in creating the positional invariance. Otherwise This invariance also triggers false positive for images which have the components of a ship but not in the correct order.”
This was never the intention of pooling layer!
Convolutional Neural Networks
What we need : EQUIVARIANCE (not invariance)
Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc
“Equivariance makes a CNN understand the rotation or proportion change and adapt itself accordingly so that the spatial positioning inside an image is not lost.”
Capsules
“A capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or an object part.”
8D capsule e.g.
Hue, Position, Size, Orientation, deformation, texture, etc.
Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets
8D capsule e.g.
Hue, Position, Size, Orientation, deformation, texture, etc.
Capsules
8D vector
Inverse Rendering
Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets
8D capsule e.g.
Hue, Position, Size, Orientation, deformation, texture, etc.
Capsules
8D vector
Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets
8D capsule e.g.
Hue, Position, Size, Orientation, deformation, texture, etc.
Capsules
8D vector
Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets
8D capsule e.g.
Hue, Position, Size, Orientation, deformation, texture, etc.
Capsules
8D vector
Equivariance of Capsules
Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets
8D capsule e.g.
Hue, Position, Size, Orientation, deformation, texture, etc.
Capsules
8D vector
Equivariance of Capsules
Contents from https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-iii-dynamic-routing-between-capsules-349f6d30418
Routing by Agreements
Aurélien Géron, 2017
Primary Capsules
=
=
Primary Capsules
Aurélien Géron, 2017
Predict Next Layer’s Output
=
=
Primary Capsules
Aurélien Géron, 2017
Predict Next Layer’s Output
=
=
Primary Capsules
Aurélien Géron, 2017
Predict Next Layer’s Output
=
=
One transformation matrix Wi,jper part/whole pair (i, j).
ûj|i = Wi,j ui
Primary Capsules
Aurélien Géron, 2017
Predict Next Layer’s Output
=
=
Primary Capsules
Aurélien Géron, 2017
Predict Next Layer’s Output
=
=
Primary Capsules
Aurélien Géron, 2017
Compute Next Layer’s Output
=
=
Predicted Outputs
Primary Capsules
Aurélien Géron, 2017
Routing by Agreement
=
=
Predicted Outputs
Primary Capsules
Strong agreement!
Aurélien Géron, 2017
The rectangle and triangle capsules should be routed to the boat capsules.
Routing by Agreement
=
=
Predicted Outputs
Primary Capsules
Strong agreement!
Aurélien Géron, 2017
Routing Weights
=
=
Predicted Outputs
Primary Capsules
bi,j=0 for all i, j
Aurélien Géron, 2017
Routing Weights
=
=
Predicted Outputs
Primary Capsules
0.5
0.5
0.5
0.5
bi,j=0 for all i, j
ci = softmax(bi)
Aurélien Géron, 2017
Compute Next Layer’s Output
=
=
Predicted Outputs
sj = weighted sum
Primary Capsules
0.5
0.5
0.5
0.5
Aurélien Géron, 2017
Compute Next Layer’s Output
=
=
Predicted Outputs
Primary Capsules
0.5
0.5
0.5
0.5
sj = weighted sum
vj = squash(sj)
Aurélien Géron, 2017
Actual outputsof the next layer capsules(round #1)
Compute Next Layer’s Output
=
=
Predicted Outputs
Primary Capsules
0.5
0.5
0.5
0.5
sj = weighted sum
vj = squash(sj)
Aurélien Géron, 2017
Actual outputsof the next layer capsules(round #1)
Update Routing Weights
=
=
Predicted Outputs
Primary Capsules
Agreement
Aurélien Géron, 2017
Actual outputsof the next layer capsules(round #1)
Update Routing Weights
=
=
Predicted Outputs
Primary Capsules
Agreement bi,j += ûj|i . vj
Aurélien Géron, 2017
Actual outputsof the next layer capsules(round #1)
Update Routing Weights
=
=
Predicted Outputs
Primary Capsules
Agreement bi,j += ûj|i . vj
Large
Aurélien Géron, 2017
Actual outputsof the next layer capsules(round #1)
Update Routing Weights
=
=
Predicted Outputs
Primary Capsules
Disagreement bi,j += ûj|i . vj
Small
Aurélien Géron, 2017
Compute Next Layer’s Output
=
=
Predicted Outputs
Primary Capsules
0.2
0.1
0.8
0.9
Aurélien Géron, 2017
Compute Next Layer’s Output
=
=
Predicted Outputs
sj = weighted sum
Primary Capsules
0.2
0.1
0.8
0.9
Aurélien Géron, 2017
Compute Next Layer’s Output
=
=
Predicted Outputs
Primary Capsules
sj = weighted sum
vj = squash(sj)0.2
0.1
0.8
0.9
Aurélien Géron, 2017
Actual outputsof the next layer capsules(round #2)
Compute Next Layer’s Output
=
=
Predicted Outputs
Primary Capsules
0.2
0.1
0.8
0.9
Aurélien Géron, 2017
Handling Crowded Scenes
=
=
=
=
Aurélien Géron, 2017
Handling Crowded Scenes
=
=
=
=
Is this an upside down house?
Aurélien Géron, 2017
Handling Crowded Scenes
=
=
=
=
House
Thanks to routing by agreement, the ambiguity is quickly resolved (explaining away).
Boat
Aurélien Géron, 2017
Classification CapsNet
|| ℓ2 || Estimated Class Probability
Aurélien Géron, 2017
Training
|| ℓ2 || Estimated Class ProbabilityTo allow multiple classes, minimize margin loss:
Lk = Tk max(0, m+ - ||vk||2)
+ λ (1 - Tk) max(0, ||vk||2 - m-)
Tk = 1 iff class k is present
In the paper:m- = 0.1m+ = 0.9λ = 0.5
Aurélien Géron, 2017
Training
Translated to English:
“If an object of class k
is present, then ||vk||2should be no less than 0.9. If not, then ||vk||2should be no more than 0.1.”
|| ℓ2 || Estimated Class ProbabilityTo allow multiple classes, minimize margin loss:
Lk = Tk max(0, m+ - ||vk||2)
+ λ (1 - Tk) max(0, ||vk||2 - m-)
Tk = 1 iff class k is present
In the paper:m- = 0.1m+ = 0.9λ = 0.5
Aurélien Géron, 2017
Regularization by Reconstruction
|| ℓ2 ||
Feedforward Neural Network
Decoder
Reconstruction
Aurélien Géron, 2017
Regularization by Reconstruction
|| ℓ2 ||
Feedforward Neural Network
Decoder
Reconstruction
Loss = margin loss + α reconstruction loss
The reconstruction loss is the squared difference between the reconstructed image and the input image.In the paper, α = 0.0005.
Aurélien Géron, 2017
A CapsNet for MNIST
(Figure 1 from the paper)
Aurélien Géron, 2017
A CapsNet for MNIST – Decoder
(Figure 2 from the paper)
Aurélien Géron, 2017
Interpretable Activation Vectors
(Figure 4 from the paper)
Aurélien Géron, 2017
Pros
● Reaches high accuracy on MNIST, and promising on CIFAR10
● Requires less training data
● Position and pose information are preserved (equivariance)
● This is promising for image segmentation and object detection
● Routing by agreement is great for overlapping objects (explaining away)
● Capsule activations nicely map the hierarchy of parts
● Offers robustness to affine transformations● Activation vectors are easier to interpret (rotation, thickness, skew…)
● It’s Hinton! ;-)
Aurélien Géron, 2017
● Not state of the art on CIFAR10 (but it’s a good start)
● Not tested yet on larger images (e.g., ImageNet): will it work well?
● Slow training, due to the inner loop (in the routing by agreement algorithm)
● A CapsNet cannot see two very close identical objects○ This is called “crowding”, and it has been observed as well in human vision
Cons
Results
What the individual dimensions of a capsule represent
Results
MultiMNISTSegmenting Highly Overlapping Digits
Questions Remained
Does capsules really work as the real neurons do?
perceptual illusions
Thompson, P. (1980). Margaret Thatcher: a new illusion. Perception, 38, (6). 483-484.
• https://arxiv.org/abs/1710.09829 (paper)• https://jhui.github.io/2017/11/03/Dynamic-Routing-Between-Capsules/
• https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc
• https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b
• https://www.youtube.com/watch?v=pPN8d0E3900 (video)• https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets (video slides)
References