Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... ·...

135
Big Data Analytics Architectures, Algorithms and Applications Part #2: Intro to deep Learning Edward Chang 張智威 HTC (Prior: Google & U. California) Simon Wu HTC (prior: Twitter & Microsoft)

Transcript of Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... ·...

Page 1: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Big Data Analytics Architectures, Algorithms

and Applications!Part #2: Intro to deep

Learning

Edward Chang 張智威 HTC (Prior: Google & U. California) !

Simon Wu!HTC (prior: Twitter & Microsoft)

Page 2: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Three  Lectures •  Lecture  #1:  Scalable  Big  Data  Algorithms  

– Scalability  issues  – Key  algorithms  with  applica=on  examples  

•  Lecture  #2:  Intro  to  Deep  Learning  – Autoencoder  &  Sparse  Coding  – Graph  models:  CNN,  MRF,  &  RBM  

•  Lecture  #3:  Analy=cs  PlaMorm  [by  Simon  Wu]  –  Intro  to  LAMA  plaMorm  – Code  lab    

1/27/15   Ed  Chang  @  BigDat  2015  

Page 3: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Acknowledging  Slide  Contributors

•  Geoffrey  Hinton  •  Yoshua  Bengio  •  Russ  Salakhutdinov  •  Kai  Yu  •  Yann  Lecun  •  Andrew  Ng  •  Steven  Seitz    

1/27/15   Ed  Chang  @  BigDat  2015  

Page 4: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

 Lecture  #2  Outline

•  Data  Posteriors  vs.  Human  Priors  •  Learn  p(x)  from  Big  Data    

– Use  NN  to  construct  Autoencoder  – Sparse  Coding  – Dynamic  Par=al    

•  Graphical  Models  – CNN,  MRF,  &  RBM  

•  Demo

1/27/15   Ed  Chang  @  BigDat  2015  

Page 5: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Representa=on?

1/27/15 Ed  Chang  @  BigDat  2015  

Page 6: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Knowledge  or  Feature  extrac1on  in  Image  processing:  involves  using  algorithms  to  detect  and  isolate  various  desired  edges  or  shapes  

Low-­‐level:  Edge  detec=on,  corner  detec=on,  ridge  detec=on,  or  more  generally  “Scale-­‐invariant  feature  transform”  (SIFT)    

Curvature:  Shape  informa=on,  blob  detec=on  

Hough  transform:  Lines,  circles/ellipse,  arbitrary  shapes  (Generalized  Hough  Transform)    

 

Typical  Image/Video  Representa=on  Based  on  Domain  Knowledge  and  Human  Priors

1/27/15 Ed  Chang  @  BigDat  2015  

Page 7: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Template  matching  (medical  imaging)  

Flexible  methods  for  2D,  3D  or  3D+=me  edge  extrac=on,  road  detec=on,  MRI,  fMRI    

Color  and  texture  representa1ons:  Histograms,  various  transforma=ons  for  conduc=ng  frequency-­‐domain  analysis.  e.g.,  wavelets  

Mo1on:  Mo=on  detec=on:  e.g.,  op=cal  flow,  global  or  area  based  

 

…Many  Related  Work  on  Representa=on

1/27/15 Ed  Chang  @  BigDat  2015  

Page 8: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Key  Design  Goals  for  Representa=on

Design  features  x  that  are  invariant  and  selec+ve  •  Good  Invariance  

–  Same  object  should  have  the  same  features  

•  Good  Selec=vity  (Disentanglement)  – Different  objects  should  exhibit  different  features  for  telling  them  apart  

Once  x  has  been  designed,  find  label  y  for  x  and  then  learn  p(y|x)  

1/27/15 Ed  Chang  @  BigDat  2015  

Page 9: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Challenges •  Invariance  affected  by  noise    

–  Environmental  factor  (e.g.,  ligh=ng  condi=on,  occlusion)  –  Equipment  factor  (e.g.,  different  camera  brands  different  colors  and  gamma  correc=on)  

– Aliasing  (e.g.,  cars  have  different  models,  hence  different  features)  

•  Selec=vity  requires  good  similarity  func=ons  •  Labeled  data  is  tough  to  acquire  

–  Learning  robust  model  requires  big  data  

1/27/15 Ed  Chang  @  BigDat  2015  

Page 10: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Remedy  #1  Learn  ϕ  from  data  p(x|ϕ)  ≈  p*(x)

•  Instead  of  designing,  learn  features  ϕ  from  data,  from  data    

•  Data:  not  just  the  original  data,  but  adding  variants  to  the  data  –  E.g.,  adding  scaled,  rotated,  cropped,  mirrored,  gamma  adjusted  images    

•  Instead  of  requiring  invariant  features  as  input  to  a  model,  let  the  model  cope  with  invariance  

•  Then,  learn  features  ϕ  for  predic=ng  p(x|ϕ)  accurately  (p(x|ϕ)≈  p*(x))  in  an  unsupervised  way  from  data  already  covering  variant  condi=ons  

1/27/15   Ed  Chang  @  BigDat  2015  

Page 11: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Remedy  #2  Deep  Model

•  Learn  representa=on  in  a  hierarchical  way  •  [T.  Serre,  T.  Poggio;  MIT  2005]  

1/27/15   Ed  Chang  @  BigDat  2015  

Page 12: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Lecture  Outline

•  Data  Posteriors  vs.  Human  Priors  •  Learn  p(x)  from  Big  Data  

– Use  NN  to  Construct  Autoencoder  – Sparse  Coding  – Dynamic  Par=al    

•  Graphical  Models  – CNN,  MRF,  &  RBM  

•  Demo

1/27/15   Ed  Chang  @  BigDat  2015  

Page 13: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Mul=ple-­‐Layer  Networks    Neuron  Network  (NN)  Model  

An  elementary  neuron  with  R  inputs  is  shown  below.  Each  input  is  weighted  with  an  appropriate  w.  The  sum  of  the  weighted  inputs  and  the  bias  forms  the  input  to  the  transfer  func=on  f.  Neurons  can  use  any  differen1able  transfer  func1on  f  to  generate  their  output.  

Page 14: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

NN  Model  Transfer  Func=ons  (Ac=vi=on  Func=on)  

Mul=layer  networks  oren  use  the  log-­‐sigmoid  transfer  func=on  logsig.  The  func=on  logsig  generates  outputs  between  0  and  1  as  the  neuron's  net  input  goes  from  nega=ve  to  posi=ve  infinity  

Page 15: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

NN  Model    Feedforward  Network  

A  single-­‐layer  network  of  S  logsig  neurons  having  R  inputs  is  shown  below  in  full  detail  on  the  ler  and  with  a  layer  diagram  on  the  right.  

Page 16: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Example  Four-­‐layer  NN

1/27/15   Ed  Chang  @  BigDat  2015  

Input  Layer                                Hidden  Layer  #1                    Hidden  Layer  #2              Output  Layer

y

Page 17: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

NN  Model  Learning  Algorithm  

The  following  slides  describes  learning  process  of  mul=-­‐layer  neural  network  employing  backpropaga1on  algorithm.  To  illustrate  this  process  the  three  layer  neural  network  with  two  inputs  and  one  output,which  is  shown  in  the  picture  below,  is  used:    

Page 18: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Learning  Algorithm:  Backpropaga=on    

Each  neuron  is  composed  of  two  units.  First  unit  adds  products  of  weights  coefficients  and  input  signals.  The  second  unit  realizes  a  nonlinear  func=on,  called  neuron  transfer  (ac=va=on)  func=on.  Signal  e  is  adder  output  signal,    and  y  =  f(e)  is  output  signal  of  nonlinear  element.  Signal  y  is  also  output  signal  of  neuron.    

Page 19: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Feed  Forward  Pictures  below  illustrate  how  signal  is  forward-­‐feeding  through  the  network,  Symbols  w(xm)n  represent  weights  of  connec=ons  between  network  input  xm  and  neuron  n  in  input  layer.  Symbols  yn  represents  output  signal  of  neuron  n.  

Page 20: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Feed  Forward  

Page 21: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Feed  Forward  

Page 22: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Feed  Forward  Propaga=on  of  signals  through  the  hidden  layer.  Symbols  wmn  represent  weights  of  connec=ons  between  output  of  neuron  m  and  input  of  neuron  n  in  the  next  layer.    

Page 23: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Feed  Forward  

Page 24: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Learning  Algorithm:  Forward  Pass  

Propaga=on  of  signals  through  the  output  layer.  

Page 25: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Learning  Algorithm:  Backpropaga=on    

To  teach  the  neural  network  we  need  training  data  set.  The  training  data  set  consists  of  input  signals  (x1  and  x2  )  assigned  with  corresponding  target  (desired  output)  z.      The  network  training  is  an  itera=ve  process.  In  each  itera=on  weights  coefficients  of  nodes  are  modified  using  new  data  from  training  data  set.  Modifica=on  is  calculated  using  algorithm  described  below:      Each  teaching  step  starts  with  forcing  both  input  signals  from  training  set.  Arer  this  stage  we  can  determine  output  signal  values  for  each  neuron  in  each  network  layer.      

Page 26: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Learning  Algorithm:  Backpropaga=on    

In  the  next  algorithm  step  the  output  signal  of  the  network  y  is  compared  with  the  desired  output  value  (the  target  z),  which  is  found  in  training  data  set.  The  difference  is  called  error  signal  δ  of  output  layer  neuron  

Page 27: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Learning  Algorithm:  Backpropaga=on    

The  idea  is  to  propagate  error  signal  δ  (computed  in  single  teaching  step)  back  to  all  neurons,  which  output  signals  were  input  for  discussed  neuron.    

Page 28: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Learning  Algorithm:  Backpropaga=on    

The  idea  is  to  propagate  error  signal  δ  (computed  in  single  teaching  step)  back  to  all  neurons,  which  output  signals  were  input  for  discussed  neuron.    

Page 29: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Learning  Algorithm:  Backpropaga=on    

The  weights'  coefficients  wmn  used  to  propagate  errors  back  are  equal  to  this  used  during  compu=ng  output  value.  Only  the  direc=on  of  data  flow  is  changed  (signals  are  propagated  from  output  to  inputs  one  arer  the  other).  This  technique  is  used  for  all  network  layers.  If  propagated  errors  came  from  few  neurons  they  are  added.  The  illustra=on  is  below:    

Page 30: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Learning  Algorithm:  Backpropaga=on    

When  the  error  signal  for  each  neuron  is  computed,  the  weights  coefficients  of  each  neuron  input  node  may  be  modified.  In  formulas  below  df(e)/de  represents  deriva=ve  of  neuron  ac=va=on  func=on  (which  weights  are  modified).  

Page 31: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Learning  Algorithm:  Backpropaga=on    

When  the  error  signal  for  each  neuron  is  computed,  the  weights  coefficients  of  each  neuron  input  node  may  be  modified.  In  formulas  below  df(e)/de  represents  deriva=ve  of  neuron  ac=va=on  func=on  (which  weights  are  modified).  

Page 32: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Learning  Algorithm:  Backpropaga=on    

When  the  error  signal  for  each  neuron  is  computed,  the  weights  coefficients  of  each  neuron  input  node  may  be  modified.  In  formulas  below  df(e)/de  represents  deriva=ve  of  neuron  ac=va=on  func=on  (which  weights  are  modified).  

Page 33: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Sigmoid  func=on  f(e)  and  its  deriva=ve  f’(e)    

Ed  Chang  @  BigDat  2015  

f (e) = 11+ e−βe

, β is the paramter for slope

Hence

f ' (e) = df (e)de

=d 1

1+ e−βe"

#$

%

&'

d(1+ e−βe )df (e−βe )de

f ' (e) = −β(1+ e−βe )2 e

−βe =−β

(1+ e−βe )2 e−e

=1

(1+ e−βe )−βe−e

(1+ e−βe )= f (e) 1−β f (e)( )

For simplicity, paramter for the slope β =1f ' (e) = f (e) 1− f (e)( )

hup://link.springer.com/chapter/10.1007%2F3-­‐540-­‐59497-­‐3_175#page-­‐1  

hup://mathworld.wolfram.com/SigmoidFunc=on.html  

1/27/15  

Page 34: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Autoencorder  NN  for  Unsupervised  Compression

1/27/15   Ed  Chang  @  BigDat  2015  

Page 35: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

hw,b(x)  ≈x

1/27/15   Ed  Chang  @  BigDat  2015  

Page 36: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Parameter  Learning

•  10x10  images  with  100  pixels  

•  R100  Possible  configura=ons    

•  H  hidden  units  –  H  =  100?  –  H  =  50?  PCA  

•  Too  computa=onal  intensive  to  learn  w    

1/27/15   Ed  Chang  @  BigDat  2015  

Page 37: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Learning  Algorithm

•  Suppose  ϕ  (or  h)  to  be  a  set  of  hidden  variables    •  Model  image  x  with  k  independent  hidden  features  ϕi  with  addi=ve  noise  v  

•  The  goal  is  to  find  a  set  of  h  such  that  posterior        P(x|ϕ)  us  as  close  as  P*(x)  or  to  minimize  KL  divergence  between  the  two  

1/27/15   Ed  Chang  @  BigDat  2015  

x = aiφi + v(x)i=1

k

Page 38: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

…Learning  Algorithm •  Minimize  KL  divergence  between  the  two  dist.    

•  Since  P*(x)  is  constant  across  choice  of  ϕ, Min  KL  è  Maximize  the  log-­‐likelihood  P(x|ϕ)  

1/27/15   Ed  Chang  @  BigDat  2015  

D(P*(x) || P(x |φ)) = P*(x)log P*(x)P(x |φ)!

"#

$

%&dx∫

φ*= argmaxφ log(P(x |φ)

φ*,a*= argminφ,a x( j ) − ai( j )φi

i=1

k

∑j=1

m∑

2

+λ S(ai( j )

i=1

k∑ ) Revisit  this  later

Page 39: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Lecture  Outline

•  Data  Posteriors  vs.  Human  Priors  •  Learn  p(x)  from  Big  Data  

– Use  NN  to  Construct  Autoencoder  – Sparse  Coding  – Dynamic  Par=al    

•  Graphical  Models  – CNN,  MRF,  &  RBM  

•  Demo

1/27/15   Ed  Chang  @  BigDat  2015  

Page 40: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

General  Priors  in  Real-­‐World  Data  [Y.  Bengio,  et  al.,  2014]

•  A  Hierarchical  Organiza=on  of  Factors  •  Smoothness    

–  x  ≈y  à  f(x)  ≈    f(y)  –  Nearest  neighbor  assump=on  

•  Local  manifold  –  Clustered  –  Low  degree  of  freedom  –  E.g.,  PCA  

•  Distributed  Representa=ons  –  Feature  reuse,  and  abstract  &  invariant  representa=ons    –  Dynamic  and  par=al    1/27/15   Ed  Chang  @  BigDat  2015  

Page 41: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

•  Every  learning  model  is  a  variant  of  the  nearest  neighbor  model  

•  Similar  objects  should  reside  in  the  neighborhood  of  a  feature  subspace  

         

Smoothness  Nearest  Neighbor  Model

1/27/15 Ed Chang @ BigDat 2015

Page 42: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Local  Low  dimensional  manifolds

K.  Yu  and  A.  Ng,  Tutorial:  Feature  Learning  for  Image  classifica1on  Part  3:  Image  Classifica1on  using  Sparse  Coding:  Advanced  Topics,  ECCV-­‐2010.

Data  Manifold

Local  linear

1/27/15   Ed  Chang  @  BigDat  2015  

Page 43: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Smooth,  Local,  Sparse

K.  Yu  and  A.  Ng,  Tutorial:  Feature  Learning  for  Image  classifica1on  Part  3:  Image  Classifica1on  using  Sparse  Coding:  Advanced  Topics,  ECCV-­‐2010.

Data

Basis

Local  linear

Data  Manifold

Each  datum  can  be  represented  by  its  neighbor  anchors  

Sparse  Combina=on  

1/27/15   Ed  Chang  @  BigDat  2015  

Page 44: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Sparse  Coding  [Olshausen  &  Field,1996]

•  Find  representa=on  of  data,  unsupervised  – Tradi=onally  PCA  (too  contrived,  why?)  

•  Find  over-­‐complete  bases  in  an  efficient  way  •  x  ≈ a  ϕ,  where  x  in  Rn  and  ϕ  in  Rm,  m  >  n    •  Coefficients  ϕ  cannot  be  uniquely  determined  •  Thus,  impose  sparsity  on  ϕ    •  k-­‐sparsity      1/27/15   Ed  Chang  @  BigDat  2015  

Page 45: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Sparse  Coding

1/27/15   Ed Chang @ BigDat 2015

N

x

N  X  1

a

N

M  X  1 K

φA fixed Dictionary

N  X  M

Page 46: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

What  is  Sparse  Coding  

1/27/15

mina,�

mX

i=1

������xi �

kX

j=1

ai,j⇥j

������

2

+ �mX

i=1

kX

j=1

|ai,j |

Sparse  coding  (Olshausen  &  Field,1996).  Originally  developed  to  explain  early  visual  processing  in  the  brain  (edge  detec=on  in  V1).    

Training:  given  a  set  of  random  patches  x,  learning  a  dic=onary  of  bases  [Φ1,  Φ2,  …]  

Coding:  for  data  vector  x,  solve  LASSO  to  find  the  sparse  coefficient  vector  a  

 

 

 

Ed  Chang  @  BigDat  2015  

Page 47: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Sparse  Coding:  Training  Time  Input:  Images  x1,  x2,  …,  xm  (each  in  Rd)  Learn:  Dic=onary  of  bases  ϕ1,  ϕ2,  …,  ϕk  (also  Rd).  

mina,�

mX

i=1

������xi �

kX

j=1

ai,j⇥j

������

2

+ �mX

i=1

kX

j=1

|ai,j |

Alterna=ng  op=miza=on:    1.  Fix  dic=onary  ϕ1,  ϕ2,  …,  ϕk  ,  op=mize  a  (  standard  LASSO  

problem)  2.  Fix  ac=va=ons  a,  op=mize  dic=onary  ϕ1,  ϕ2,  …,  ϕk  ,  (a  convex  

QP  problem)  

1/27/15   Ed  Chang  @  BigDat  2015  

Page 48: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Sparse  Coding:  Tes=ng  Time  Input:  Unseen  image  patch  xi  (in  Rd)  and  previously  learned  ϕi’s  Output:  Representa=on  [ai,1,  ai,2,  …,  ai,k]  of  image  patch  xi.    

≈ 0.8 * + 0.3 * + 0.5 *

Represent  xi  as:  ai  =  [0,  0,  …,  0,  0.8,  0,  …,  0,  0.3,  0,  …,  0,  0.5,  …]    

mina,�

mX

i=1

������xi �

kX

j=1

ai,j⇥j

������

2

+ �mX

i=1

kX

j=1

|ai,j |

1/27/15   Ed  Chang  @  BigDat  2015  

Page 49: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Jus=fica=ons  &  Examples

•  Probabilis=c  Interpreta=on   •  Human  Visual  Cortex  

– Not  enforcing  orthogonal  bases  like  PCA  – Over-­‐complete  preserves  more  features  

•  Scales,  orienta=ons  

1/27/15   Ed  Chang  @  BigDat  2015  

Page 50: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Revisit  Autoencoder’s                                    Probabilis=c  Interpreta=on

•  Suppose  ϕ  (or  h)  to  be  a  set  of  hidden  variables    •  Model  image  x  with  k  independent  hidden  features  ϕi  with  addi=ve  noise  v  

•  The  goal  is  to  find  a  set  of  h  such  that  posterior        P(x|ϕ)  us  as  close  as  P*(x)  or  to  minimize  KL  divergence  between  the  two  

1/27/15   Ed  Chang  @  BigDat  2015  

x = aiφi + v(x)i=1

k

Page 51: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

…Probabilis=c  Interpreta=on •  Minimize  KL  divergence  between  the  two  dist.    

•  Since  P*(x)  is  constant  across  choice  of  h  •  Maximize  the  log-­‐likelihood  P(x|ϕ)  

1/27/15   Ed  Chang  @  BigDat  2015  

D(P*(x) || P(x |φ)) = P*(x)log P*(x)P(x |φ)!

"#

$

%&dx∫

φ*= argmaxφ log(P(x |φ)

Page 52: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

…Probabilis=c  Interpreta=on •  We  need  two  terms  P(x|a, ϕ)  and  p(a)  because  

•  Assume  white  noise  v  is  Gaussian  with  variance  σ2  

•  To  determine  P(x|ϕ),  we  need  the  prior  P(a).    Assume  the  independence  of  source  features  

1/27/15   Ed  Chang  @  BigDat  2015  

P(x | a,φ) = 1Zexp −

(x − aiφi )2

i=1

k∑2σ 2

#

$

%%

&

'

((

P(a) = p(ai )i=1

k

P(x |φ) = P(x | a,φ)P(a)da∫

Page 53: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

…Probabilis=c  Interpreta=on •  Add  sparsity  assump=on-­‐-­‐-­‐every  image  is  a  product  of  few  

features,  we  would  like  probability  distribu=on  of  ai  to  be  peaked  at  zero  and  have  a  high  kurtosis,  S(ai)  controls  the  shape  

 

1/27/15   Ed  Chang  @  BigDat  2015  

P(ai ) =1Zexp(−βS(ai ))

P(x |φ) = P(x | a,φ)P(a)da∫

P(a) = p(ai )i=1

k

Page 54: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

…Probabilis=c  Interpreta=on •  The  problem  is  reduced  to  that  over  all  input  data  

   

1/27/15   Ed  Chang  @  BigDat  2015  

φ*= argmaxφ log(P(x |φ)

Max logj=1

m∑ P(x | a,φ)P(a)da∫

=Max log exp(−(x − aiφi )

2∑2σ 2∫

j=1

m

∑ ) exp(−βS(ai ))∏

=Max logj=1

m

∑ exp(−(x − aiφi )2∑∫ − βS(ai ))∑

→Min x( j ) − ai( j )φii=1

k∑

j=1

m

∑2

+λ S(ai( j )

i=1

k∑ )

Page 55: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

…Probabilis=c  Interpreta=on •  Maximizing  log  likelihood  is  equivalent  to  minimizing  

energy  func=on  

•  The  choices  of  S(.),  L1  or  log  penalty,  correspond  to  the  use  of  the  Laplacian  and  the  Cauchy  prior,  respec=vely  

   

1/27/15   Ed  Chang  @  BigDat  2015  

P(ai )∝ exp(−β ai )

P(ai )∝β

1+ ai2

φ*= argmaxφ log(P(x |φ)

φ*,a*= argminφ,a x( j ) − ai( j )φi

i=1

k

∑j=1

m∑

2

+λ S(ai( j )

i=1

k∑ )

Page 56: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Jus=fica=ons  &  Examples

•  Probabilis=c  Interpreta=on   •  Human  Visual  Cortex  

– Not  enforcing  orthogonal  bases  like  PCA  – Over-­‐complete  preserves  more  features  

•  Scales,  orienta=ons  

1/27/15   Ed  Chang  @  BigDat  2015  

Page 57: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Feature  Invariance •  Human  visual  system  works  so  well  •  “Mental”  model  (T.  Serre,  T.  Poggio;  MIT  2005)  

– Ventral  visual  pathway  – Deep  learning  

1/27/15 Ed  Chang  @  BigDat  2015  

Page 58: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Visual  Pathway    [Hubel  Wiesle,  68]  

1/27/15   Ed  Chang  @  BigDat  2015  

Primary  Visual  Cortex  (V1)

Page 59: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Extrastriate  cortex  

Page 60: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Extrastriate  cortex  

Page 61: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Feedforward  Path  of  Ventral  Stream •  Invariance  (overcomplete)  

–  V1,  star=ng  with  scale/posi=on/orienta=on  invariance  over  a  restricted  range  

–  Then  invariance  of  view  points  and  other  transforma=ons  

•  Mul=-­‐layer,  mul=-­‐area  (deep)  –  V2  and  V3  (shape),  Improve  complexity  of  op=mal  s=mulus  

•  Feedforward  –  First  150  millisecond  of  percep=on  – No  color  informa=on  (in  V4)  – W/o  feedback  

1/27/15 Ed  Chang  @  BigDat  2015  

Page 62: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Six  Steps  of  HMAX  [T.  Serre,  T.  Poggio;  MIT  2005]

1/27/15 Ed  Chang  @  BigDat  2015  

Page 63: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Mul=-­‐layer  Visual  Pathway

1/27/15 Ed  Chang  @  BigDat  2015  

•  Edge  detec=on,  mul=-­‐scale,  mul=-­‐direc=on    (on/off,  simple)  – Using  mul=-­‐scale  mul=-­‐direc=on  Gabor  filters  

•  Edge  pooling    (max,  invariance)  –  Keep  “strong”  features”  

•  Unsupervised  clustering  (or)  –  Clustering  edges  into  patches  

Page 64: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

V1  Like  Bases  

1/27/15   Ed  Chang  @  BigDat  2015  

Page 65: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Mul=-­‐layer  Visual  Pathway

1/27/15 Ed  Chang  @  BigDat  2015  

•  Part  Detec=on    (on/off,  simple)  –  Find  matching  patches  in  photos  

•  Part  Pooling    (max,  invariance)  –  Iden=fy  useful  patches/parts  

•  Supervised  Learning  – Object  ß  parts  

Page 66: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

       Edges  and  Parts

1/27/15   Ed  Chang  @  BigDat  2015  

Page 67: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Six  Steps  of  HMAX  [T.  Serre,  T.  Poggio;  MIT  2005]  

•  Edge  Detec=on,  mul=-­‐/scale,direc=on  (on/off,  simple)  –  Using  mul=-­‐scale  mul=-­‐orienta=on  Gabor  filters  

•  Edge  Pooling  (max,  invariance)  –  Keep  “strong”  features”  

•  Unsupervised  Clustering  (or)  –  Clustering  edges  into  patches  

•  Part  Detec=on  (on/off,  simple)  –  Find  matching  patches  in  photos  

•  Part  Pooling  (max,  invariance)  –  Iden=fy  useful  patches/parts  

•  Supervised  Learning  –  Object  ß  parts  

1/27/15   Ed  Chang  @  BigDat  2015  

Page 68: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Revisit  Challenges  of    Representa=on  Learning

•  Invariance  affected  by  noise    –  Environmental  factor  (e.g.,  ligh=ng  condi=on,  occlusion)  –  Equipment  factor  (e.g.,  different  camera  brands  different  colors  and  gamma  correc=on)  

– Aliasing  (e.g.,  cars  have  different  models,  hence  different  features)  

•  Labeled  data  is  tough  to  acquire  –  Robust  models  require  big  data  

•  Selec=vity  requires  good  similarity  func=ons  

1/27/15 Ed  Chang  @  BigDat  2015  

Page 69: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Lecture  Outline

•  Data  Posteriors  vs.  Human  Priors  •  Learn  p(x)  from  Big  Data  

– Use  NN  to  Construct  Autoencoder  – Sparse  Coding  – Dynamic  Par=al    

•  Graphical  Models  – CNN,  MRF,  &  RBM  

•  Demo

1/27/15   Ed  Chang  @  BigDat  2015  

Page 70: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Example  of  Sparse  Models  

1/27/15

•  because the 2nd and 4th elements of w are non-zero, these are the two selected features in x

•  globally-aligned sparse representation

x1 [ | | | | | | ]

x2 [ | | | | | | ]

xm [ | | | | | | ]

x3 [ | | | | | | ]

[ 0 | 0 | 0 0 ]

[ 0 | 0 | 0 0 ]

[ 0 | 0 | 0 0 ]

[ 0 | 0 | 0 0 ]

f(x) = <w,x>, where w=[0, 0.2, 0, 0.1, 0, 0]                                                                                      

Ed  Chang  @  BigDat  2015  

Page 71: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Example  of  Sparse  Ac=va=ons    

1/27/15

•  Different  x    has  different  dimensions  ac=vated  •  Locally-­‐shared  sparse  representa=on:  similar  x’s  tend  to  have  

similar  non-­‐zero  dimensions,  but  not  all    

a1 [ 0 | | | 0 … 0 ]

a2 [ | | | 0 0 … 0 ]

am [ 0 0 0 | | … 0 ]

a3 [ | 0 | | 0 … 0 ]

x1

x2 x3

xm

Ed  Chang  @  BigDat  2015  

Page 72: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Example  of  Sparse  Ac=va=ons    

1/27/15

•  Preserving  manifold  structure  •  i.e.,  clusters,  manifolds,      

a1 [ | | | 0 0 … 0 ] a2 [ 0 | | | 0 … 0 ]

am [ 0 0 0 0 | … 0 ]

a3 [ 0 0 | | | … 0 ]

x1 x2 x3

xm    

Ed  Chang  @  BigDat  2015  

Page 73: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Similarity  Theories  

•  Objects  are  similar  in  all  respects  (Richardson  1928)  

•  Objects  are  similar  in  some  respects  (Tversky  1977)  

•  Similarity  is  a  process  of  determining  respects,  rather  than  using  predefined  respects  (Goldstone  94)  

Page 74: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Similarity  Theories  

•  Objects  are  similar  in  all  or  some  respects    

•  Minkowski  Func=on  – D  =  (Σi  =  1..M  (pi  -­‐  qi)n)1/n  

•  Weighted  Minkowski  Func=on  – D  =  (Σi  =  1..M,  wi(pi  -­‐  qi)n)1/n  

•  Same  w  is  imposed  to  app  pairs  of  objects  p  and  q    

[ 0 | 0 | 0 0 ]

[ 0 | 0 | 0 0 ]

[ 0 | 0 | 0 0 ]

[ 0 | 0 | 0 0 ]

Page 75: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

DPF:  Dynamic  Par=al  Func=on  [B.  Li,  E.  Chang,  et  al,  MM  Systems  2013]  

•  Similarity  is  a  process  of  determining  respects,  rather  than  using  predefined  respects  (Goldstone  94)  

a1 [ 0 | | | 0 … 0 ]

a2 [ | | | 0 0 … 0 ]

am [ 0 0 0 | | … 0 ]

a3 [ | 0 | | 0 … 0 ]

a1 [ | | | 0 0 … 0 ] a2 [ 0 | | | 0 … 0 ]

am [ 0 0 0 0 | … 0 ]

a3 [ 0 0 | | | … 0 ]    

Page 76: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

0 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 0

0.007545 0.01307 0.004637 0.002413 0.002635 0.002954 0.0020070.014669 0.02717 0.010578 0.006734 0.007725 0.006379 0.0057660.012615 0.023055 0.009333 0.006764 0.007363 0.006593 0.0054430.082128 0.212612 0.068016 0.037835 0.032241 0.018068 0.0132030.061564 0.176548 0.045542 0.026445 0.026374 0.018583 0.0220370.019243 0.037016 0.015684 0.010834 0.012792 0.013536 0.0093460.09418 0.153677 0.066896 0.040249 0.036368 0.030341 0.0211380.1284 0.335405 0.13774 0.072613 0.054947 0.039216 0.043319

0.041414 0.101403 0.035881 0.022633 0.018991 0.017131 0.019450.014024 0.049782 0.01457 0.0053 0.004439 0.003041 0.0052260.049319 0.120274 0.045804 0.020165 0.019499 0.013805 0.018513

GIF

00.020.040.060.080.10.120.14

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

Feature Number

Ave

rage

Dis

tanc

e

00.050.10.150.20.250.30.350.4

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

0 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0

0.002923 0.004377 0.029086 0.017063 0.007649 0.002019 0.001984 0.011560.006648 0.010143 0.070708 0.046142 0.023502 0.005178 0.005169 0.030140.006298 0.009264 0.075118 0.042225 0.020053 0.006285 0.006533 0.0300430.010198 0.056025 0.052869 0.033199 0.018294 0.00688 0.006858 0.023620.017066 0.047514 0.104013 0.073459 0.037468 0.013849 0.01293 0.0483440.008148 0.015337 0.074134 0.044238 0.021222 0.005197 0.005099 0.0299780.013529 0.051743 0.063263 0.038084 0.020885 0.010481 0.009844 0.0285110.045746 0.104141 0.145924 0.11276 0.065015 0.026333 0.02593 0.0751920.026167 0.034522 0.085067 0.054154 0.02918 0.015887 0.014371 0.0397320.002676 0.012148 0.008913 0.004682 0.002452 0.000913 0.000905 0.0035730.014527 0.036084 0.046779 0.024712 0.017418 0.004182 0.004991 0.0196160.012121 0.030269 0.045198 0.022268 0.012468 0.004706 0.004955 0.017919

Scale up/down

00.050.10.150.20.250.30.350.4

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

Feature Number

Aver

age

Dis

tanc

e

0.024788 0.069615 0.0226 0.009364 0.01 0.00678 0.0097120.094781 0.227558 0.099002 0.046466 0.047815 0.036883 0.0246990.093399 0.233519 0.188091 0.043026 0.037991 0.022151 0.0240640.040228 0.102763 0.034949 0.014184 0.01465 0.010237 0.0155170.001163 0.000896 0.000722 0.000627 0.000349 0.000452 0.0027580.006947 0.006769 0.003541 0.006377 0.002048 0.005515 0.0130060.006365 0.005313 0.002064 0.004006 0.002055 0.003338 0.01010.011705 0.010935 0.006615 0.007506 0.003319 0.005911 0.0152110.009434 0.010169 0.004484 0.006306 0.002582 0.004798 0.0136570.006305 0.005997 0.003392 0.005719 0.002382 0.004853 0.0128020.005835 0.00945 0.004323 0.00564 0.002688 0.004535 0.0063320.008149 0.009636 0.0047 0.006213 0.002564 0.003375 0.0064210.006776 0.010315 0.005393 0.008004 0.003845 0.005659 0.0132030.001526 0.002551 0.000576 0.000371 0.000331 0.000286 0.000380.016302 0.022657 0.007055 0.00353 0.002171 0.004162 0.003980.012414 0.020159 0.007076 0.003102 0.00188 0.004606 0.003490.007231 0.013591 0.004979 0.001092 0.000582 0.002766 0.0007410.011588 0.015102 0.005764 0.003855 0.00262 0.004584 0.0037920.01212 0.016013 0.006441 0.004048 0.002728 0.004856 0.0042410.012235 0.01671 0.00483 0.002616 0.00197 0.00268 0.001672

Cropping

00.050.10.150.20.250.30.35

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

Feature Number

Ave

rage

Dis

tanc

e

0.006109 0.019169 0.032795 0.015229 0.008667 0.002357 0.00292 0.0123940.01223 0.070665 0.046472 0.02549 0.017445 0.008694 0.00841 0.0213020.019067 0.08113 0.04592 0.024327 0.014169 0.004995 0.005275 0.0189370.011323 0.029089 0.063856 0.037716 0.01988 0.00522 0.005556 0.0264460.000995 0.000971 0.00241 0.001415 0.000736 0.000275 0.000272 0.0010220.007103 0.006337 0.015615 0.008709 0.003433 0.001572 0.002071 0.006280.004321 0.004457 0.012494 0.007507 0.003403 0.001351 0.001976 0.0053460.007451 0.008135 0.017145 0.008711 0.003192 0.001154 0.00223 0.0064860.00576 0.006822 0.015235 0.00869 0.003676 0.001193 0.002159 0.0061910.006491 0.005948 0.013473 0.007436 0.003165 0.001777 0.002377 0.0056460.003832 0.005257 0.011884 0.008077 0.002654 0.001227 0.001213 0.0050110.004812 0.005389 0.011737 0.00729 0.003216 0.001534 0.002039 0.0051630.008795 0.007888 0.016303 0.008801 0.004048 0.002367 0.0027 0.0068440.000451 0.000707 0.002277 0.001346 0.000797 0.000253 0.000239 0.0009820.004914 0.006924 0.01499 0.009123 0.006657 0.003364 0.003391 0.0075050.004473 0.006398 0.017247 0.008858 0.005219 0.002338 0.002392 0.0072110.001723 0.003639 0.010426 0.005216 0.003024 0.00043 0.000423 0.0039040.00427 0.005712 0.011221 0.00856 0.006923 0.004464 0.004462 0.0071260.004978 0.006186 0.009864 0.007161 0.005881 0.003835 0.003847 0.0061180.001722 0.0046 0.015611 0.007291 0.00338 0.000508 0.00049 0.005456

Rotation

0

0.02

0.04

0.06

0.08

0.1

0.12

1 10 19 28 37 46 55 64 73 82 91 100

109

118

127

136

Feature Number

Ave

rage

Dis

tanc

e

Page 77: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

DPF:  Dynamic  Par=al  Func=on  [B.  Li,  E.  Chang,  et  al,  MM  Systems  2013]  

• Which  Place  is  Similar  to  Kyoto?  •  Par=al  •  Dynamic  •  Dynamic  Par=al  Func=on  

Page 78: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Precision/Recall  

Page 79: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Par=al,  Dynamic  Low  dimensional  manifolds

K.  Yu  and  A.  Ng,  Tutorial:  Feature  Learning  for  Image  classifica1on  Part  3:  Image  Classifica1on  using  Sparse  Coding:  Advanced  Topics,  ECCV-­‐2010.

Data  Manifold

Local  linear

1/27/15   Ed  Chang  @  BigDat  2015  

Page 80: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Part  #1  Summary

•  Overcomplete  Representa=on  •  Sparse  weigh=ng  vector  a  for  x  •  Autoencoders  &  Sparse  Coding  

– Equivalent  models  – One  with  implicit  and  one  with  explicit  f(x)    

1/27/15   Ed  Chang  @  BigDat  2015  

Page 81: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Autoencoders  

-­‐  also  involve  ac=va=on  and  reconstruc=on    -­‐  but  have  explicit  f(x),  e.g.,  sigmoid  func=on  -­‐  not  necessarily  enforce  sparsity  on  a    -­‐  but  if  put  sparsity  on  a,  oren  get  improved  results  [e.g.  sparse  RBM,  Lee  et  al.  NIPS08]  

1/27/15

x  

a          

                f(x)  x’  

a  

g(a)  encoding decoding

Ed  Chang  @  BigDat  2015  

Page 82: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Sparse  Coding  

1/27/15

mina,�

mX

i=1

������xi �

kX

j=1

ai,j⇥j

������

2

+ �mX

i=1

kX

j=1

|ai,j |

-­‐  a  is  sparse    -­‐  a  is  oren  higher  dimension  than  x  -­‐  Ac=va=on  a  =  f(x)  is  nonlinear  implicit  func=on  of  x    -­‐  reconstruc=on  x’  =  g(a)  is  linear  &  explicit  

x  

a  

f(x)  x’  

a  

g(a)  encoding decoding

Ed  Chang  @  BigDat  2015  

Page 83: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Hierarchical  Sparse  Coding  

Sparse  Coding                                        Pooling                            Sparse  Coding                          Pooling  

Learning  from  unlabeled  data  

Yu,  Lin,  &  Lafferty,  CVPR  11  Mauhew  D.  Zeiler,  Graham  W.  Taylor,  and  Rob  Fergus,  ICCV  11  

1/27/15   Ed  Chang  @  BigDat  2015  

Page 84: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

DEEP  MODELS CNN,  MRF  &  RBM

1/27/15   Ed  Chang  @  BigDat  2015  

Page 85: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Recap  NN

•  Other  network  architectures  –  how  the  different  neurons  are  connected  to  each  other

Layer  3  Layer  1   Layer  2   Layer  4  In  tradi1onal  NN,  neurons  in  a  layer  are  fully  connected  to  all  neurons  in  the  next  layer.

1/27/15   Ed  Chang  @  BigDat  2015  

Page 86: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

CNN:  NN  Considers  Sparse  Coding

1/27/15 Ed  Chang  @  BigDat  2015  

Page 87: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

The  replicated  feature  approach  (Hinton:  the  dominant  approach  for  neural  networks)  

•  Use  many  different  copies  of  the  same  feature  detector  with  different  posi=ons.  –  Could  also  replicate  across  scale  and  orienta=on  (tricky  and  expensive)  

–  Replica=on  greatly  reduces  the  number  of  free  parameters  to  be  learned.  

•  Use  several  different  feature  types,  each  with  its  own  map  of  replicated  detectors.  –  Allows  each  patch  of  image  to  be  represented  in  several  ways  à  overcomplete  

 

The  red  connec=ons  all  have  the  same  weight.  

1/27/15   Ed  Chang  @  BigDat  2015  

Page 88: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

CNN  Architecture:    Convolu=onal  Layers

Spa=ally-­‐local  correla=on  – Spa=al  informa=on  is  encoded  in  the  network  – Sparse  connec=vity  

Layer  1  

Layer  2  

v   … v  

Par1al  Convolu1onal  Layer  

1/27/15   Ed  Chang  @  BigDat  2015  

Page 89: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Page 90: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Page 91: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Page 92: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Page 93: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Pooling  the  Outputs  of  Replicated  Feature  Detectors  

Get  a  small  amount  of  transla=onal  invariance  at  each  level  by  averaging  four  neighboring  replicated  detectors  to  give  a  single  output  to  the  next  level.    

– This  reduces  the  number  of  inputs  to  the  next  layer  of  feature  extrac=on,  thus  allowing  us  to  have  many  more  different  feature  maps.  

– Taking  the  maximum  of  the  four  (like  HMAX)  works  slightly  beuer  (G.  Hinton).  

1/27/15   Ed  Chang  @  BigDat  2015  

Page 94: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Convolu=onal  Networks  [LeCun  97]  

•  Convolu=on  (feature  detec=on)  •  Sub-­‐sampling  (mul=-­‐scale)  •  Perform  C  &  S  itera=vely  to  form  a  deep-­‐learning  network  

•  Learning  weights  from  data  

•  Loca=on  informa=on  (where  an  object  is  at)  lost    

1/27/15   Ed  Chang  @  BigDat  2015  

Page 95: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

The  82  errors  made  by  LeNet5  

No=ce  that  most  of  the  errors  are  cases  that  people  find  quite  easy.  

The  human  error  rate  is  probably  20  to  30  errors  but  nobody  has  had  the  pa=ence  to  measure  it.  

1/27/15   Ed  Chang  @  BigDat  2015  

Hinton  NIPS  2013

Page 96: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Ciresan’s  brute  force  approach  •  LeNet  uses  knowledge  about  the  invariances  to  design:  –   the  local  connec=vity  –   the  weight-­‐sharing  –   the  pooling.    

•  Achieves  about  80  errors  –  This  can  be  reduced  to  about  40  errors  by  using  many  different  transforma=ons  of  the  input  and  other  tricks  (Ranzato  2008)  

•  Ciresan  et.  al.  (2010)  inject  knowledge  of  invariances  by  crea=ng  a  huge  amount  of  carefully  designed  extra  training  data:  –  For  each  training  image,  they  produce  many  new  training  examples  by  applying  many  different  transforma=ons.  

–  They  can  then  train  a  large,  deep,  dumb  net  on  a  GPU  without  much  overfi�ng.  

•  Improves  to  35  errors  

1/27/15   Ed  Chang  @  BigDat  2015  

Hinton  NIPS  2013

Page 97: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

The errors made by the Ciresan et. al. net

The  top  printed  digit  is  the  right  answer.  The  bouom  two  printed  digits  are  the  network’s  best  two  guesses.    The  right  answer  is  almost  always  in  the  top  2  guesses.    With  model  averaging  they  can  now  get  about  25  errors.      

1/27/15   Ed  Chang  @  BigDat  2015  

Hinton  NIPS  2013

Page 98: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

From  hand-­‐wriuen  digits  to  3-­‐D  objects  

•  Recognizing  real  objects  in  color  photographs  downloaded  from  the  web  is  much  more  complicated  than  recognizing  hand-­‐wriuen  digits:  –  Hundred  =mes  as  many  classes  (1,000  vs  10)  –  Hundred  =mes  as  many  pixels  (256  x  256  color  vs    28  x  28  gray)  –  Two  dimensional  image  of  three-­‐dimensional  scene.  –  Cluuered  scenes  requiring  segmenta=on  – Mul=ple  objects  in  each  image.  

•  Will  the  same  type  of  CNN  work?  

1/27/15   Ed  Chang  @  BigDat  2015  

Hinton  NIPS  2013

Page 99: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

The  ILSVRC-­‐2012  compe==on  on  ImageNet  

•  The  dataset  has  1.2  million  high-­‐resolu=on  training  images.  

•  The  classifica=on  task:  –  Get  the  “correct”  class  in  your  top  5  bets.  There  are  1,000  classes.  

•  The  localiza=on  task:  –  For  each  bet,  put  a  box  around  the  object.  Your  box  must  have  at  least  50%  overlap  with  the  correct  box.  

1/27/15   Ed  Chang  @  BigDat  2015  

Hinton  NIPS  2013

Page 100: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Examples  

1/27/15   Ed  Chang  @  BigDat  2015  

Hinton  NIPS  2013

Page 101: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Error  rates  on  the  ILSVRC-­‐2012  compe==on  

•  University  of  Tokyo                            •  Oxford  University  Computer  Vision  Group  

•  INRIA  (French  na=onal  research  ins=tute  in  CS)  +  XRCE  (Xerox  Research  Center  Europe)      

•  University  of  Amsterdam  

•  26.1%            53.6%  •  26.9%            50.0%  

•  27.0%  

•  29.5%            

•  University  of  Toronto  (Alex  Krizhevsky)   16.4% 34.1% •     

classifica=on  classifica=on  &localiza=on  

1/27/15   Ed  Chang  @  BigDat  2015  

Hinton  NIPS  2013

Page 102: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

A  neural  network  for  ImageNet  

•  Alex  Krizhevsky  (NIPS  2012)  developed  a  very  deep  convolu=onal  neural  net  of  the  type  pioneered  by    Yann  Le  Cun.  Its  architecture  was:  –  7  hidden  layers  not  coun=ng  some  max  pooling  layers.  

–  The  early  layers  were  convolu=onal.  

–  The  last  two  layers  were  globally  connected.  

•  The  ac=va=on  func=ons  were:  

–  Rec=fied  linear  units  in  every  hidden  layer  f(x)  =  max(0,  x).  These  train  much  faster  and  are  more  expressive  than  logis=c  units.  

–  Compe==ve  normaliza=on  to  suppress  hidden  ac=vi=es  when  nearby  units  have  stronger  ac=vi=es.  This  helps  with  varia=ons  in  intensity.    

1/27/15   Ed  Chang  @  BigDat  2015  

Hinton  NIPS  2013

Page 103: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Tricks  that  significantly    improve  generaliza=on  

•  Bagging  Train  on  random  224x224  patches  from  the  256x256  images  to  get  more  data.  Also  use  ler-­‐right  reflec=ons  of  the  images.  At  test  =me,  combine  the  opinions  from  ten  different  patches:  The  four  224x224  corner  patches  plus  the  central  224x224  patch  plus  the  reflec=ons  of  those  five  patches.    

•  Dropout  (Sparsifica=on)  Use  “dropout”  to  regularize  the  weights  in  the  globally  connected  layers  (which  contain  most  of  the  parameters).  Dropout  means  that  half  of  the  hidden  units  in  a  layer  are  randomly  removed    for  each  training  example.    This  stops  hidden  units  from  relying  too  much  on  other  hidden  units.  

1/27/15   Ed  Chang  @  BigDat  2015  

Hinton  NIPS  2013

Page 104: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Dropout:  An  efficient  way  to  average  many  large  neural  nets  (hup://arxiv.org/abs/1207.0580)  

•  Consider  a  neural  net  with  one  hidden  layer.  

•  Each  =me  we  present  a  training  example,  we  randomly  omit  each  hidden  unit  with  probability  0.5.  

•  So  we  are  randomly  sampling  from  2H  different  architectures.  All  architectures  share  weights.  

1/27/15   Ed  Chang  @  BigDat  2015  

Hinton  NIPS  2013

Page 105: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Dropout  as  a  form  of  model  averaging  Bagging  

 •  Sample  from  2H  models,  so  only  a  few  of  the  models  ever  get  trained,  and  they  only  get  one  training  example.  – This  is  as  extreme  as  Bagging  can  get.  

•  The  sharing  of  the  weights  means  that  every  model  is  very  strongly  regularized.  –  It’s  a  much  beuer  regularizer  than  L2  or  L1  penal=es  that  pull  the  weights  towards  zero.  

1/27/15   Ed  Chang  @  BigDat  2015  

Hinton  NIPS  2013

Page 106: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Page 107: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

DEEP  MODELS CNN,  MRF  &  RBM

1/27/15   Ed  Chang  @  BigDat  2015  

Page 108: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Russ  S.  KDD  04  Tutorial

Page 109: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Directed  Graph  Bayesian  Networks  

General  Factoriza=on        pak  denotes  parents  of  xk  

1/27/15   Ed  Chang  @  BigDat  2015  

Russ  S.  KDD  04  Tutorial

Page 110: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Russ  S.  KDD  04  Tutorial

Page 111: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15  

“Explaining  Away” •  Cause  inference  for  directed  graphs  has  one  subtlety  

•  Illustra=on:  pixel  colour  in  an  image  

image  colour  

surface  colour  

ligh=ng  colour  

Ed  Chang  @  BigDat  2015  

C.  Bishop,  ECCV  tutorial

Page 112: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Shortcomings  of  Back-­‐propaga=on  •  It  requires  labeled  training  data  

– Almost  all  data  is  unlabeled.    •  The  learning  =me  does  not  scale  well  

– It  is  very  slow  in  networks  with  mul=ple  hidden  layers.  

– Backward  pass:  signal  =  dE/dy,  diminishing  as  #  layers  increases  

•  It  can  get  stuck  in  poor  local  op=ma  – These  are  oren  quite  good,  but  for  deep  nets  they  are  far  from  op=mal.  

1/27/15   Ed  Chang  @  BigDat  2015  

Page 113: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

MRF  &  RBM Directed  à  Undirected  Graph

1/27/15   Ed  Chang  @  BigDat  2015  

Page 114: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Markov  Random  Field  (MRF)  Components  •  A  set  of  sites  or  pixels:  P={1,…,m}  :  each  pixel  is  a  site.  •  Each  pixel’s  Neighborhood  N={Np  |  p  ∈  P}  •  A  set  of  random  variables  (random  field),  one  for  each  pixel  

X={Xp  |  p  ∈  P}    •  Denotes  the  label  at  each  pixel.  

 Each  random  variable  takes  a  value  xp  from  the  set  of  labels  L={l1,…,lk}    

•  We  have  a  joint  event  {X1=x1,…,  Xm=xm}  ,  or  a  configura=on,  abbreviated  as  X=x    

•  The  joint  prob.  Of  such  configura=on:    p(X=x)  or  p(x)  •  Many  possible  configura=ons:  k^m  

From Slides by S. Seitz - University of Washington 1/27/15   Ed  Chang  @  BigDat  2015  

Page 115: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15  

 Markov  Random  Field  Hammersley-­‐Clifford  Theorem  

•  p(x)  joint  distribu=on  is  product  of  non-­‐nega=ve  func=ons  over  the  cliques  (neighbourhoods)  of  the  graph    

•  where                                are  the  clique  poten+als,  and  Z  is  a  normaliza=on  constant  

Ed  Chang  @  BigDat  2015  

Page 116: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Russ  S.  KDD  04  Tutorial

Page 117: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Russ  S.  KDD  04  Tutorial

Page 118: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Equilibrium  Interpreta=on

•  Expected  value  of  product  of  states  at  thermal  equilibrium  when  nothing  is  clamped  

1/27/15   Ed  Chang  @  BigDat  2015  

•  Expected  value  of  product  of  states  at  thermal  equilibrium  when  the  training  data  is  clamped  on  the  visible  units  

∂L(θ )∂θij

= EPdata[xix j ]−EPθ

[xix j ]

Page 119: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Russ  S.  KDD  04  Tutorial

Page 120: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Russ  S.  KDD  04  Tutorial

Page 121: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Russ  S.  KDD  04  Tutorial

Page 122: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Russ  S.  KDD  04  Tutorial

Page 123: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Russ  S.  KDD  04  Tutorial

Page 124: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Model  Learning  Similar  to  MRF

•  Expensive  to  compute  with  exponen=al  #  of  configura=ons  (over  all  possible  images)  

•  Use  MCMC  

1/27/15   Ed  Chang  @  BigDat  2015  

•  Simple  to  compute  

∂L(θ )∂θij

= EPdata[vihj ]−EPθ

[vihj ]

Page 125: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Russ  S.  KDD  04  Tutorial

Page 126: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Russ  S.  KDD  04  Tutorial

Page 127: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Russ  S.  KDD  04  Tutorial

Page 128: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Russ  S.  KDD  04  Tutorial

Page 129: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Latest  ImageNet  Compe==on  Update

1/27/15   Ed  Chang  @  BigDat  2015  

Page 130: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

Key  References  

1/27/15  

•  Deep  Learning  video  lectures  hup://videolectures.net/Top/Computer_Science/Machine_Learning/Deep_Learning/  

•  A  Data-­‐Driven  Study  on  Image  Feature  Extrac<on  and  Fusion,  Zhiyu  Wang,  Fangtao  Li,  Edward  Y.  Chang,  and  Shiqiang  Yang,  Google  Technical  Report,  April  2012  

•  Founda<ons  of  Large-­‐Scale  Mul<media  Informa<on  Management  and  Retrieval,  E.  Y.  Chang,  Springer,  2011  

•  Convolu<onal  deep  belief  networks  for  scalable  unsupervised  learning  of  hierarchical  representa<ons,  Honglak  Lee,  Roger  Grosse,  Rajesh  Ranganath  and  Andrew  Y.  Ng.  In  Proceedings  of  the  Twenty-­‐Sixth  Interna+onal  Conference  on  Machine  Learning,  2009    

•  Robust  Object  Recogni<on  with  Cortex-­‐like  Mechanisms,  T.  Serre,  L.  Wolf,  S.  Bileschi,  M.  Riesenhuber,  and  T.  Poggio,  IEEE  Transac=ons  on  Pauern  Analysis  and  Machine  Intelligence,  29(3):411–426,  2007.  

•  Object  Recogni<on  from  Local  Scale-­‐Invariant  Features,  D.G.  Lowe,  In  IEEE  Interna=onal  Conference  on  Computer  Vision  (ICCV),  1999.  

Ed  Chang  @  BigDat  2015  

Page 131: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

…Key  References  

1/27/15  

•  A  Tutorial  on  Energy-­‐Based  Learning,  Yann  LeCun,  et  al,  Predic=ng  Structured  Data,  MIT  Press,  2006.  

•  Dropout,  A  Simple  Way  to  Prevent  Neural  Networks  from  OverfiNng,  N.  Srivastava,  G.  Hinton,  A.  Krizhevsky,  U.  Sutskever,  and  R.  Salakhutdinov,  Journal  of  Machine  Learning,  2014.  

•  A  Fast  Learning  Algorithm  for  Deep  Belief  Nets,  G.  Hinton,  S.  Osindero,  and  Y.  The,  Neural  Computa=on,  2006  

•  Representa<on  Learning  Tutorial,  Yoshua  Bengio,  ICML  2012.  •  Representa<on  Learning:  A  Review  and  New  Perspec<ves,  Y.  Bengio,  A.  Courville,  

and  P.  Vincent,  April  2014  •  Convolu<onal  networks  for  images,  speech,  and  <me  series,  Y.  LeCun  and  Y.  

Bengio,  The  handbook  of  brain  theory  and  neural  networks  3361,  310,  1995.  •  Sparse  Coding  with  an  Overcomplete  Basis  Set,  A  Strategy  Employed  by  V1,  

Olshausen  &  Field,Vision  Research,  37(23),  p.3311-­‐3325,  1997.  •  Deep  Learning  Tutorial,  R.  Salakhutdinov  KDD,  2014    

Ed  C,hang  @  BigDat  2015  

Page 132: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

APPENDIX

1/27/15   Ed  Chang  @  BigDat  2015  

Page 133: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Page 134: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015  

Page 135: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep

1/27/15   Ed  Chang  @  BigDat  2015