Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University.

Post on 17-Jan-2016

214 views 0 download

Transcript of Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University.

Deep Convolutional Nets

11th March 2015

Jiaxin Shi

Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• The replicated feature approach

• Use many different copies of the

same feature detector with

different positions.

– Replication greatly reduces the

number of free parameters to be

learned.

• Use several different feature types,

each with its own map of replicated

detectors.

– Allows each patch of image to be

represented in several ways.

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• What does replicating the feature detectors achieve?

• Equivariant activities: Replicated features do not make the

neural activities invariant to translation. The activities are

equivariant.

• Invariant knowledge: If a feature is useful in some locations

during training, detectors for that feature will be available in all

locations during testing.

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• Pooling the outputs of replicated feature detectors

• Get a small amount of translational invariance at each level by

averaging four neighboring replicated detectors to give a single

output to the next level.

– This reduces the number of inputs to the next layer of feature

extraction, thus allowing us to have many more different feature

maps.

– Taking the maximum of the four works slightly better.

• Problem: After several levels of pooling, we have lost information

about the precise positions of things.

– This makes it impossible to use the precise spatial

relationships between high-level parts for recognition.

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• Terminology

Kernel: 5x5

Image

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• Terminology

Kernel: 5x5

Stride: 2

Image

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• Terminology

Image

Padding: 1

Kernel: 5x5

Stride: 2

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• Terminology

Image

Feature map 1

Feature map 0

Feature map 2

Convolution Layer (5x5, 2, 1, 4)(kernel size, stride, padding, number of kernels)

Feature map 3

Padding: 1

Kernel: 5x5

Stride: 2

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• Terminology

4 feature maps

Feature map 1

Feature map 0

Feature map 2

Pooling Layer (4x4, 4, 0)(pooling size, stride, padding)

Feature map 3

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• An example – a ‘VW’ detector

V\

/

^

Input ImageChannel: 1

Layer 1 output

Channel: 3

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• An example – a ‘VW’ detector

V\

/

^

Input ImageChannel: 1

Layer 1 output

Channel: 3

‘V’ detector

‘W’ detector

Layer1Filter

(detector): 3

Layer2Filter (detector):

2x3Output Channel:

2

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• An example – a ‘VW’ detector

W\

/

^

Input ImageChannel: 1

Layer 1 output

Channel: 3

‘V’ detector

‘W’ detector

Layer1Filter

(detector): 3

Layer2Filter (detector):

2x3Output Channel:

2

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

History

• 1979, Neocognitron (Fukushima), the first convolutional nets.

Fukushima, however, did not set the weights by supervised

backpropagation, but by local unsupervised learning rules.

• 1989, LeNet (LeCun), BP for Convolutional NNs.

LeCun re-invented CNN with BP.

• 1992, Cresceptron (Weng et al., 1992), Max Pooling.

Later integrated with CNN (MPCNN).

• 2006, CNN trained on GPU (Chellapilla et al., 2006).

• 2011, Multi-Column GPU-MPCNNs (Ciresan et al., 2011),

superhuman performance.

The first system to achieve superhuman visual pattern recognition

in the IJCNN 2011 traffic sign recognition contest.

• 2012, ImageNet Breakthrough (Krizhevsky et al., 2012).

AlexNet trained on GPUs won imageNet competition.

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Outline

• Recent Progress of Supervised Convolutional

Nets

• AlexNet

• GoogLeNet

• VGGNet

• Small Break: Microsoft’s Tricks

• Representation Learning and Bayesian

Approach

• Deconvolutional Networks

• Bayesian Deep Deconvolutional Networks

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Outline

• Recent Progress of Supervised

Convolutional Nets

• AlexNet

• GoogLeNet

• VGGNet

• Small Break: Microsoft’s Tricks

• Representation Learning and Bayesian

Approach

• Deconvolutional Networks

• Bayesian Deep Deconvolutional Networks

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• AlexNet, 2012

• The architecture which made the 2012 ImageNet breakthrough.

• NIPS12, ImageNet Classification with Deep Convolutional Neural

Networks.

• A general practical guide of training deep supervised convnets.

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• AlexNet, 2012

• The architecture which made the 2012 ImageNet breakthrough.

• NIPS12, ImageNet Classification with Deep Convolutional Neural

Networks.

• A general practical guide of training deep supervised convnets.

• Main techniques

• ReLU nonlinearity

• Data augmentation

• Dropout

• Overlapping pooling

• Mini-batch SGD with momentum and weight decay

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• AlexNet, 2012

• Dropout

• Reduce overfit

• Model Average

A Brief Proof

𝑥 𝑦=𝑎𝑟𝑔𝑚𝑎𝑥𝑘′ (𝑜𝑢𝑡𝑘′)

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• AlexNet, 2012

• Dropout

• Encourage sparsity

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• GoogLeNet, 2014

• The 2014 ImageNet competition winner.

• CNN can go further if carefully tuned.

• Main techniques

• Carefully designed inception architecture

• Network in Network

• Deeply Supervised Nets

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• GoogLeNet, 2014

• The 2014 ImageNet competition winner.

• CNN can go further if carefully tuned.

• Main techniques

• Carefully designed inception architecture

• Network in Network

• Deeply Supervised Nets

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• GoogLeNet, 2014

• The 2014 ImageNet competition winner.

• CNN can go further if carefully tuned.

• Main techniques

• Network in Network

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• GoogLeNet, 2014

• The 2014 ImageNet competition winner.

• CNN can go further if carefully tuned.

• Main techniques

• Network in Network

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• GoogLeNet, 2014

• The 2014 ImageNet competition winner.

• CNN can go further if carefully tuned.

• Main techniques

• Deeply Supervised Net

• associating a “companion” classification output with each

hidden layer.

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• GoogLeNet, 2014

• The 2014 ImageNet competition winner.

• CNN can go further if carefully tuned.

• Main techniques

• Deeply Supervised Net

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• VGGNet, 2014

• A simple and always state-of-art architecture compared to

GoogLeNet-like structure (very hard to tune).

• Developed by Oxford (later DeepMind) people. Based on

Zeiler & Fergus’s 2013 work.

• Most widely used now.

• Small filter (3x3) and small stride (1)

Jiaxin Shi 11th March 2015 Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Outline

• Recent Progress of Supervised Convolutional

Nets

• AlexNet

• GoogLeNet

• VGGNet

• Small Break: Microsoft’s Tricks

• Representation Learning and Bayesian

Approach

• Deconvolutional Networks

• Bayesian Deep Deconvolutional Networks

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Outline

• Recent Progress of Supervised Convolutional

Nets

• AlexNet

• GoogLeNet

• VGGNet

• Small Break: Microsoft’s Tricks

• Representation Learning and Bayesian

Approach

• Deconvolutional Networks

• Bayesian Deep Deconvolutional Networks

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Deconvolutional Networks, Zeiler & Fergus, CVPR 2010

• Deep layered model for representation learning

• Optimization perspective

• Results are better than previous representation learning

methods but there is still distance from supervised CNN models.

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Deconvolutional Networks, Zeiler & Fergus, CVPR 2010

• : number of filters (dictionaries).

• : channel c of the nth image.

• : channel c of the kth filter (dictionary).

• : sparse, indicates the position and pixel-wise strength of .

• Cost function of the first layer

• : number of channels. : number of filters (dictionaries).

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Deconvolutional Networks, Zeiler & Fergus, CVPR 2010

• Stack layer

• : layer l’s number of channels.

• Learning process

• Optimize layer by layer.

• Optimize over feature maps .

• Optimize over filters (dictionaries) .

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Deconvolutional Networks, Zeiler & Fergus, CVPR 2010

• Stack layer

• : layer l’s number of channels.

• Learning process

• Optimize layer by layer.

• Optimize over feature maps .

• When , convex.

• But poorly conditioned due to being coupled to one

another by filters. (Why?)

• Optimize over filters (dictionaries) . Using gradient

descent.

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Deconvolutional Networks, Zeiler & Fergus, CVPR 2010

• Learning process

• Optimize layer by layer.

• Optimize over feature maps .

• When , convex.

• But poorly conditioned due to being coupled to one

another by filters.

• Solution:

𝐶𝑙 (𝑊 𝑙− 1(𝑛 ) )= 𝜆

2∑𝑐=1

𝐾 𝑙 −1||∑𝑘=1

(𝐾 𝑙 )

𝑊 𝑙(𝑛 ,𝑘)∗𝐷 (𝑘 , 𝑐 )−𝑊 𝑙−1

(𝑛 , 𝑐 )||22

+∑𝑘=1

𝐾 𝑙

|𝑥 𝑙(𝑛 ,𝑘)|𝑝+∑

𝑘=1

𝐾 𝑙

‖𝑥 𝑙(𝑛 ,𝑘)−𝑊 𝑙

(𝑛 ,𝑘)‖22

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Deconvolutional Networks, Zeiler & Fergus, CVPR 2010

• Performance (slightly outperforms sift-based approaches and

CDBN)

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Bayesian Deep Deconvolutional Learning, Yunchen, 2015

• Deep layered model for representation learning

• Bayesian perspective

• Claim state-of-art classification performance using

representation learned

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Bayesian Deep Deconvolutional Learning, Yunchen, 2015

• : the nth image.

• : indicates which shifted version of is used to

represent .

• : indicates the pixel-wise strength of .

• Compared to the Deconvolutional Networks paper

• here is actually an explicit version of sparse in the

2010 paper.

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Bayesian Deep Deconvolutional Learning, Yunchen, 2015

• : the nth image.

• : indicates which shifted version of is used to

represent .

• : indicates the pixel-wise strength of .

• Priors

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Bayesian Deep Deconvolutional Learning, Yunchen, 2015

• : the nth image.

• : indicates which shifted version of is used to

represent .

• : indicates the pixel-wise strength of .

• Pooling

• Within each block of S(n,kl,l), either all nxny pixels are

zero, or only one pixel is non-zero, with the position of

that pixel selected stochastically via a multinomial

distribution.

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Pooling

• Within each block of S(n,kl,l), either all nx*ny pixels are

zero, or only one pixel is non-zero, with the position of that

pixel selected stochastically via a multinomial distribution.

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Pooling

• Within each block of S(n,kl,l), either all nx*ny pixels are

zero, or only one pixel is non-zero, with the position of that

pixel selected stochastically via a multinomial distribution.

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Learning Process

• Bottom to top: gibbs sampling and MAP samples selected

• Top to Bottom Refinement

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Bayesian Deep Deconvolutional Learning, Yunchen, 2015

• Intuition of Deconvolutional Networks (Generative)

• An image is made up of patches.

• These patches are weighted transformation of dictionary

elements.

• We learn dictionaries from training data.

• A new image is then represented by position and weights of

dictionaries.

• Intuition of Convolutional Networks

• An image is made up of patches.

• We can learn feature detectors for various kinds of patches.

• Then we use these feature detectors to scan a new image,

and classify it based on features (kinds of patches) detected.

• Both are translation equivariant.

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Performance

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Discussion

• Deep Supervised CNNs still has limits. Where lies further

improvement?

• Why does bayesian learning of deconvolution representations work

much better than those in optimization perspective?

Jiaxin Shi 11th March 2015 Tsinghua University

Thank you.

Deep Convolutional Nets