Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University.

Deep Convolutional Nets

11th March 2015

Jiaxin Shi

Tsinghua University

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• The replicated feature approach

• Use many different copies of the

same feature detector with

different positions.

– Replication greatly reduces the

number of free parameters to be

learned.

• Use several different feature types,

each with its own map of replicated

detectors.

– Allows each patch of image to be

represented in several ways.

• What does replicating the feature detectors achieve?

• Equivariant activities: Replicated features do not make the

neural activities invariant to translation. The activities are

equivariant.

• Invariant knowledge: If a feature is useful in some locations

during training, detectors for that feature will be available in all

locations during testing.

• Pooling the outputs of replicated feature detectors

• Get a small amount of translational invariance at each level by

averaging four neighboring replicated detectors to give a single

output to the next level.

– This reduces the number of inputs to the next layer of feature

extraction, thus allowing us to have many more different feature

– Taking the maximum of the four works slightly better.

• Problem: After several levels of pooling, we have lost information

about the precise positions of things.

– This makes it impossible to use the precise spatial

relationships between high-level parts for recognition.

• Terminology

Kernel: 5x5

• Terminology

Kernel: 5x5

Stride: 2

• Terminology

Padding: 1

Kernel: 5x5

Stride: 2

• Terminology

Feature map 1

Feature map 0

Feature map 2

Convolution Layer (5x5, 2, 1, 4)(kernel size, stride, padding, number of kernels)

Feature map 3

Padding: 1

Kernel: 5x5

Stride: 2

• Terminology

4 feature maps

Feature map 1

Feature map 0

Feature map 2

Pooling Layer (4x4, 4, 0)(pooling size, stride, padding)

Feature map 3

• An example – a ‘VW’ detector

Input ImageChannel: 1

Layer 1 output

Channel: 3

Layer 1 output

Channel: 3

‘V’ detector

‘W’ detector

Layer1Filter

(detector): 3

Layer2Filter (detector):

2x3Output Channel:

Layer 1 output

Channel: 3

‘V’ detector

‘W’ detector

Layer1Filter

(detector): 3

Layer2Filter (detector):

2x3Output Channel:

History

• 1979, Neocognitron (Fukushima), the first convolutional nets.

Fukushima, however, did not set the weights by supervised

backpropagation, but by local unsupervised learning rules.

• 1989, LeNet (LeCun), BP for Convolutional NNs.

LeCun re-invented CNN with BP.

• 1992, Cresceptron (Weng et al., 1992), Max Pooling.

Later integrated with CNN (MPCNN).

• 2006, CNN trained on GPU (Chellapilla et al., 2006).

• 2011, Multi-Column GPU-MPCNNs (Ciresan et al., 2011),

superhuman performance.

The first system to achieve superhuman visual pattern recognition

in the IJCNN 2011 traffic sign recognition contest.

• 2012, ImageNet Breakthrough (Krizhevsky et al., 2012).

AlexNet trained on GPUs won imageNet competition.

Outline

• Recent Progress of Supervised Convolutional

• AlexNet

• GoogLeNet

• VGGNet

• Small Break: Microsoft’s Tricks

• Representation Learning and Bayesian

Approach

• Deconvolutional Networks

• Bayesian Deep Deconvolutional Networks

Outline

• Recent Progress of Supervised

Convolutional Nets

• AlexNet

• GoogLeNet

• VGGNet

Approach

Recent Progress of Supervised Convolutional Nets• AlexNet, 2012

• The architecture which made the 2012 ImageNet breakthrough.

• NIPS12, ImageNet Classification with Deep Convolutional Neural

Networks.

• A general practical guide of training deep supervised convnets.

• The architecture which made the 2012 ImageNet breakthrough.

• NIPS12, ImageNet Classification with Deep Convolutional Neural

Networks.

• A general practical guide of training deep supervised convnets.

• Main techniques

• ReLU nonlinearity

• Data augmentation

• Dropout

• Overlapping pooling

• Mini-batch SGD with momentum and weight decay

• Dropout

• Reduce overfit

• Model Average

A Brief Proof

𝑥 𝑦=𝑎𝑟𝑔𝑚𝑎𝑥𝑘′ (𝑜𝑢𝑡𝑘′)

• Dropout

• Encourage sparsity

Recent Progress of Supervised Convolutional Nets• GoogLeNet, 2014

• The 2014 ImageNet competition winner.

• CNN can go further if carefully tuned.

• Main techniques

• Carefully designed inception architecture

• Network in Network

• Deeply Supervised Nets

• Main techniques

• Carefully designed inception architecture

• Deeply Supervised Nets

• Main techniques

• Deeply Supervised Net

• associating a “companion” classification output with each

hidden layer.

• Main techniques

• Deeply Supervised Net

Recent Progress of Supervised Convolutional Nets• VGGNet, 2014

• A simple and always state-of-art architecture compared to

GoogLeNet-like structure (very hard to tune).

• Developed by Oxford (later DeepMind) people. Based on

Zeiler & Fergus’s 2013 work.

• Most widely used now.

• Small filter (3x3) and small stride (1)

Outline

• AlexNet

• GoogLeNet

• VGGNet

Approach

Outline

• AlexNet

• GoogLeNet

• VGGNet

Approach

Representation Learning and Bayesian Approach• Deconvolutional Networks, Zeiler & Fergus, CVPR 2010

• Deep layered model for representation learning

• Optimization perspective

• Results are better than previous representation learning

methods but there is still distance from supervised CNN models.

• : number of filters (dictionaries).

• : channel c of the nth image.

• : channel c of the kth filter (dictionary).

• : sparse, indicates the position and pixel-wise strength of .

• Cost function of the first layer

• : number of channels. : number of filters (dictionaries).

• Stack layer

• : layer l’s number of channels.

• Learning process

• Optimize layer by layer.

• Optimize over feature maps .

• Optimize over filters (dictionaries) .

• Stack layer

• : layer l’s number of channels.

• When , convex.

• But poorly conditioned due to being coupled to one

another by filters. (Why?)

• Optimize over filters (dictionaries) . Using gradient

descent.

• When , convex.

• But poorly conditioned due to being coupled to one

another by filters.

• Solution:

𝐶𝑙 (𝑊 𝑙− 1(𝑛 ) )= 𝜆

2∑𝑐=1

𝐾 𝑙 −1||∑𝑘=1

(𝐾 𝑙 )

𝑊 𝑙(𝑛 ,𝑘)∗𝐷 (𝑘 , 𝑐 )−𝑊 𝑙−1

(𝑛 , 𝑐 )||22

+∑𝑘=1

𝐾 𝑙

|𝑥 𝑙(𝑛 ,𝑘)|𝑝+∑

𝑘=1

𝐾 𝑙

‖𝑥 𝑙(𝑛 ,𝑘)−𝑊 𝑙

(𝑛 ,𝑘)‖22

• Performance (slightly outperforms sift-based approaches and

Representation Learning and Bayesian Approach• Bayesian Deep Deconvolutional Learning, Yunchen, 2015

• Deep layered model for representation learning

• Bayesian perspective

• Claim state-of-art classification performance using

representation learned

• : the nth image.

• : indicates which shifted version of is used to

represent .

• : indicates the pixel-wise strength of .

• Compared to the Deconvolutional Networks paper

• here is actually an explicit version of sparse in the

2010 paper.

represent .

• Priors

represent .

• Pooling

• Within each block of S(n,kl,l), either all nxny pixels are

zero, or only one pixel is non-zero, with the position of

that pixel selected stochastically via a multinomial

distribution.

Representation Learning and Bayesian Approach• Pooling

• Within each block of S(n,kl,l), either all nx*ny pixels are

zero, or only one pixel is non-zero, with the position of that

pixel selected stochastically via a multinomial distribution.

Representation Learning and Bayesian Approach• Pooling

• Within each block of S(n,kl,l), either all nx*ny pixels are

zero, or only one pixel is non-zero, with the position of that

pixel selected stochastically via a multinomial distribution.

Representation Learning and Bayesian Approach• Learning Process

• Bottom to top: gibbs sampling and MAP samples selected

• Top to Bottom Refinement

• Intuition of Deconvolutional Networks (Generative)

• An image is made up of patches.

• These patches are weighted transformation of dictionary

elements.

• We learn dictionaries from training data.

• A new image is then represented by position and weights of

dictionaries.

• Intuition of Convolutional Networks

• An image is made up of patches.

• We can learn feature detectors for various kinds of patches.

• Then we use these feature detectors to scan a new image,

and classify it based on features (kinds of patches) detected.

• Both are translation equivariant.

Representation Learning and Bayesian Approach• Performance

Discussion

• Deep Supervised CNNs still has limits. Where lies further

improvement?

• Why does bayesian learning of deconvolution representations work

much better than those in optimization perspective?

Thank you.

Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University.

Documents

Transcript of Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University.

PQ - Tsinghua University

Discriminative Learning of Deep Convolutional Feature ...

Tsinghua University

CONVOLUTIONAL NEURAL NETWORK UNTUK KLASIFIKASI …

AB - Tsinghua University

Convolutional Codes

Tsinghua · 2010. 11. 16. · Q;

KLASIFIKASI CITRA MENGGUNAKAN CONVOLUTIONAL NEURAL …

Tsinghua visit

Convolutional neural networks for sentiment classification

Autentikasi Daun Herbal Menggunakan Convolutional Neural ...

4567+89 - Tsinghua University

Dynamic filters in graph convolutional network

Convolutional Neural Network - Seoul National Universityling.snu.ac.kr/class/cl_under1801/DL07-Convolutional... · 2018-06-15 · Affine Layers •문제 •데이터의형상이무시됨

Understanding Convolutional Neural Networks with ...

KLASIFIKASI CITRA X-RAY MENGGUNAKAN CONVOLUTIONAL …

Graph Convolutional Network 概説

Learning Convolutional Neural Networks for Graphs

Tsinghua · !"#$%-2301 nÆEjC#d$:;

Carol Wee, Cai Jiaxin, Jenny Ng, Teo Cheak Han, Isabel Yong … · 2015. 9. 9. · Carol Wee, Cai Jiaxin, Jenny Ng, Teo Cheak Han, Isabel Yong Group Service Quality, SingHealth Pamela