DICTIONARY LEARNING SPARSE...
Transcript of DICTIONARY LEARNING SPARSE...
SPARSE REPRESENTATIONSDICTIONARY LEARNING
1
DICTIONARY LEARNING AND SPARSE CODING
2
SPARSE REPRESENTATION — ATSI
SPARSE CODING
▸ Let be a signal
▸ Let be a dictionary
▸ Sparse coding:
‣ Where is a measure of sparsity ( , etc.)
x ∈ ℝN
Φ ∈ ℝMN
min12
∥x − Φα∥22 + λℛ(α)
ℛ ∥α∥0 ∥α∥1
3
SPARSE REPRESENTATION — ATSI
WHICH DICTIONARY ?
▸ Usual dictionary:
▸ Time-frequency
▸ Wavelets
▸ *-lets
▸ Union of dictionaries
▸ Can we learn the dictionary ?
4
SPARSE REPRESENTATION — ATSI
DICTIONARY LEARNING▸ Let a set of M signals in
▸ We seek a dictionary , , such that each signal admits a sparse representation
▸ , with , and is sparse
▸ General formulation:
‣ With :
X = (x1, x2, …, xM) ℝN
Φ ∈ ℝNK K > N xm
xm = Φαm αm ∈ ℝK αm s−
minΦ,α1,…,αm
∑m
∥xm − Φαm∥2 s . t . ∥αm∥0 ≤ s ∀m
A = (α1, …, αm)
minΦ,A
∥X − ΦA∥2 s . t . ∥αm∥0 ≤ s ∀m
5
SPARSE REPRESENTATION — ATSI
K-SVD
▸ [Elad and Aharon, 2006]
▸ Idea: alternate minimization between and
▸ Initialize
▸ Sparse coding for each :
▸ Dictionary update :
▸ Can be done column by column using SVD
A Φ
Φ
xm αm = argminα
12
∥xm − Φαm∥2 s.t. ∥α∥0 ≤ s
min12
∥X − ΦA∥2
6
SPARSE REPRESENTATION — ATSI
DICTIONARY LEARNING: ALTERNATING MINIMIZATION
▸ Idea: alternate minimization between and
▸ Initialize
▸ Sparse coding for each :
▸ Dictionary update :
▸ Can be implemented "on line"
▸ Online dictionary learning [Mairal, Bach, Ponce, Sapiro 2009]
A Φ
Φ
xm αm = argminα
12
∥xm − Φαm∥2 + λ∥α∥1
Φ = Φ +1
∥A∥2(X − ΦA)A*
7
SPARSE REPRESENTATION — ATSI
EXAMPLES
▸ K-SVD [Elad and Aharon, 2006]
▸ Dictionary learned on a noisy version of the boat image
8
Sparse representations for image restorationK-SVD: [Elad and Aharon, 2006]
Figure: Dictionary trained on a noisy version of the imageboat.
Julien Mairal Sparse Coding and Dictionary Learning 13/182
SPARSE REPRESENTATION — ATSI
EXAMPLES
▸ Online dictionary learning [Mairal, Bach, Ponce, Sapiro 2009]
9
Optimization for Dictionary LearningInpainting a 12-Mpixel photograph
Julien Mairal Sparse Coding and Dictionary Learning 62/182
Optimization for Dictionary LearningInpainting a 12-Mpixel photograph
Julien Mairal Sparse Coding and Dictionary Learning 63/182
CONVOLUTIVE SPARSE CODING
10
SPARSE REPRESENTATION — ATSI
CONVOLUTIONAL SPARSE CODING
▸ This model assumes that the signal can be decomposed as
Where
‣ are "local filters"
‣ are the corresponding sparse representation
x ∈ ℝN
x[t] = ∑k
(dk * zk)[t]
dk
zk
11
SPARSE REPRESENTATION — ATSI
CONVOLUTIONAL SPARSE CODING
▸ For each , assumes the CSC model:
‣ Learning the dictionary :
‣ [Bristow, Eriksson & Lucey 2013] [Heide, Heidrich, & Wetzstein 2015] [Papyan , Romano, Sulam & Elad 2017]
xm[t]
xm[t] = ∑k
(dk * zmk )[t]
{dk}k
mindk,zm
k∑
m
12
∥xm − ∑k
dk * zmk ∥2 + λ∥zk∥1
12
MATRIX FACTORIZATION
13
SPARSE REPRESENTATION — ATSI
MATRIX FACTORIZATION▸ Data are often available under a matrix form:
14
Feat
ures
Samples
SPARSE REPRESENTATION — ATSI
MATRIX FACTORIZATION▸ Exemple: time-frequency representation
15
Freq
uenc
ies
Time
SPARSE REPRESENTATION — ATSI
MATRIX FACTORIZATION▸ Let . We aims to factorize , with and X ∈ ℝMN X ≃ WH W ∈ ℝMK H ∈ ℝKN
16
≃
Data: X Dictionary: W Activation: H
SPARSE REPRESENTATION — ATSI
MATRIX FACTORIZATION▸ Let . We aims to factorize , with and X ∈ ℝMN X ≃ WH W ∈ ℝMK H ∈ ℝKN
17
≃
Data: X Dictionary: W Activation: H
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
SPARSE REPRESENTATION — ATSI
MATRIX FACTORIZATION: EXAMPLES
▸ PCA : , being the eigen vectors (and is orthogonal) and the eigen values
▸ SVD: , being the singular values, and are orthogonal
▸ ICA
▸ etc.
X ∈ ℝNN = UDUT U D
X ∈ ℝMN = UDVT D U V
18
SPARSE REPRESENTATION — ATSI
NON NEGATIVE MATRIX FACTORIZATION
▸ a matrix with non negative entries
▸ with and
▸ is a matrix of non negative features (exemples: spectrum)
▸ is a mature of non negative coefficients
▸ The Features can only add each other (no soustraction)
▸ Facilitate the interpretability
X ∈ ℝMN+
X = WH W ∈ ℝMK+ H ∈ ℝKN
+
W
H
19
SPARSE REPRESENTATION — ATSI
NMF EXAMPLES
▸ Example from [Lee & Seung 1999]
▸ 49 images from MIT’s CBCL data set
20
49 images among 2429 from MIT’s CBCL face dataset
14
SPARSE REPRESENTATION — ATSI
NMF EXAMPLES
▸ Example from [Lee & Seung 1999]
▸ PCA decomposition
▸ Red values = negative pixel
21
PCA dictionary with K = 25
red pixels indicate negative values
15
SPARSE REPRESENTATION — ATSI
NMF EXAMPLES
▸ Example from [Lee & Seung 1999]
▸ NMF decomposition
22
NMF dictionary with K = 25
experiment reproduced from (Lee and Seung, 1999)
16
SPARSE REPRESENTATION — ATSI
NMF EXAMPLE
▸ From [Smaragdis 2013]
▸ Spectral unmixing
23NMF for audio spectral unmixing(Smaragdis and Brown, 2003)
11
Non-Negative Matrix Factorization
! All factors are positive-valued: ! Resulting reconstruction is additive
A B O U T T H I S N O N - N E G A T I V E B U S I N E S S
1234C
ompo
nent
Frequency
1
2
3
4
Com
pone
nt
Freq
uenc
y (H
z)
Time (sec)
Input music passage
0.5 1 1.5 2 2.5 3100
500
1000
2000
3500
6000
1600020000
X ≈W ⋅HNon-negative
reproduced from (Smaragdis, 2013)
19
SPARSE REPRESENTATION — ATSI
NMF EXAMPLES
▸ Piano example from [Fevotte 2009]
▸ Demo available at: https://www.irit.fr/~Cedric.Fevotte/extras/machine_audition/
24Piano toy example
!!!! !!!! !!!! !!!!" ##### $(MIDI numbers : 61, 65, 68, 72)
Figure: Three representations of data.
demo available at https://www.irit.fr/~Cedric.Fevotte/extras/machine_audition/ 38
SPARSE REPRESENTATION — ATSI
NMF EXAMPLES
▸ Piano example from [Fevotte 2009]
▸ Demo available at: https://www.irit.fr/~Cedric.Fevotte/extras/machine_audition/
25Piano toy exampleIS-NMF on power spectrogram with K = 8
−10−8−6−4−2
K =
1
Dictionary W
0
5000
10000
15000Coefficients H
−0.20
0.2
Reconstructed components
−10−8−6−4−2
K =
2
0
5000
10000
−0.20
0.2
−10−8−6−4−2
K =
3
0
2000
4000
6000
−0.20
0.2
−10−8−6−4−2
K =
40
2000400060008000
−0.20
0.2
−10−8−6−4−2
K =
5
0
1000
2000
−0.20
0.2
−10−8−6−4−2
K =
6
0
100
200
−0.20
0.2
−10−8−6−4−2
K =
7
0
2
4
−0.20
0.2
50 100 150 200 250 300 350 400 450 500−10−8−6−4−2
K =
8
0 100 200 300 400 500 6000
1
2
0.5 1 1.5 2 2.5 3x 105
−0.20
0.2
Pitch estimates: 65.0 68.0 61.0 72.0 0 0 0 0
(True values: 61, 65, 68, 72)39
SPARSE REPRESENTATION — ATSI
NMF DECOMPOSITIONS▸ Choose a measure of fit between and :
‣ , where is a scalar cost function
‣ Exemple of cost functions :
‣ Euclidian : [Paatero & Tapper, 1994] [Lee & Seung, 2001]
‣ Kullback-Leibler divergence: [Lee & Seung, 1999] [Finesso & Spreij, 2006]
‣ Itakura-Saito divergence: (Févotte, Bertin & Durrieu, 2009]
‣ Optimisation by multiplicative update (keep the non negativity)
X WH
W, H = argminW,H≥0
D(X |WH)
D(X |WH) = ∑n,k
d(X[n, k] |{WH}[n, k]) d
D(X |WH) = ∥X − WH∥2
d(x |y) = x logxy
+ (x − y)
d(x |y) =xy
− logxy
− 1
26
SPARSE REPRESENTATION — ATSI
CONCLUSION
▸ Dictionary can be learned from the data
▸ Several model can be chosen (Sparse Coding, Convolutive Sparse Coding, NMF)
▸ One can "unroll" an algorithm to obtain a Neural Network
27