DICTIONARY LEARNING SPARSE...

SPARSE REPRESENTATIONSDICTIONARY LEARNING

1

DICTIONARY LEARNING AND SPARSE CODING

2

SPARSE REPRESENTATION — ATSI

SPARSE CODING

▸ Let be a signal

▸ Let be a dictionary

▸ Sparse coding:

‣ Where is a measure of sparsity ( , etc.)

x ∈ ℝN

Φ ∈ ℝMN

min12

∥x − Φα∥22 + λℛ(α)

ℛ ∥α∥0 ∥α∥1

3


WHICH DICTIONARY ?

▸ Usual dictionary:

▸ Time-frequency

▸ Wavelets

▸ *-lets

▸ Union of dictionaries

▸ Can we learn the dictionary ?

4


DICTIONARY LEARNING▸ Let a set of M signals in

▸ We seek a dictionary , , such that each signal admits a sparse representation

▸ , with , and is sparse

▸ General formulation:

‣ With :

X = (x1, x2, …, xM) ℝN

Φ ∈ ℝNK K > N xm

xm = Φαm αm ∈ ℝK αm s−

minΦ,α1,…,αm

∑m

∥xm − Φαm∥2 s . t . ∥αm∥0 ≤ s ∀m

A = (α1, …, αm)

minΦ,A

∥X − ΦA∥2 s . t . ∥αm∥0 ≤ s ∀m

5


K-SVD

▸ [Elad and Aharon, 2006]

▸ Idea: alternate minimization between and

▸ Initialize

▸ Sparse coding for each :

▸ Dictionary update :

▸ Can be done column by column using SVD

A Φ

Φ

xm αm = argminα

12

∥xm − Φαm∥2 s.t. ∥α∥0 ≤ s

min12

∥X − ΦA∥2

6


DICTIONARY LEARNING: ALTERNATING MINIMIZATION

▸ Idea: alternate minimization between and

▸ Initialize

▸ Sparse coding for each :

▸ Dictionary update :

▸ Can be implemented "on line"

▸ Online dictionary learning [Mairal, Bach, Ponce, Sapiro 2009]

A Φ

Φ

xm αm = argminα

12

∥xm − Φαm∥2 + λ∥α∥1

Φ = Φ +1

∥A∥2(X − ΦA)A*

7


EXAMPLES

▸ K-SVD [Elad and Aharon, 2006]

▸ Dictionary learned on a noisy version of the boat image

8

Sparse representations for image restorationK-SVD: [Elad and Aharon, 2006]

Figure: Dictionary trained on a noisy version of the imageboat.

Julien Mairal Sparse Coding and Dictionary Learning 13/182


EXAMPLES

▸ Online dictionary learning [Mairal, Bach, Ponce, Sapiro 2009]

9

Optimization for Dictionary LearningInpainting a 12-Mpixel photograph


Optimization for Dictionary LearningInpainting a 12-Mpixel photograph


CONVOLUTIVE SPARSE CODING

10


CONVOLUTIONAL SPARSE CODING

▸ This model assumes that the signal can be decomposed as

Where

‣ are "local filters"

‣ are the corresponding sparse representation

x ∈ ℝN

x[t] = ∑k

(dk * zk)[t]

dk

zk

11


CONVOLUTIONAL SPARSE CODING

▸ For each , assumes the CSC model:

‣ Learning the dictionary :

‣ [Bristow, Eriksson & Lucey 2013] [Heide, Heidrich, & Wetzstein 2015] [Papyan , Romano, Sulam & Elad 2017]

xm[t]

xm[t] = ∑k

(dk * zmk )[t]

{dk}k

mindk,zm

k∑

m

12

∥xm − ∑k

dk * zmk ∥2 + λ∥zk∥1

12

MATRIX FACTORIZATION

13


MATRIX FACTORIZATION▸ Data are often available under a matrix form:

14

Feat

ures

Samples


MATRIX FACTORIZATION▸ Exemple: time-frequency representation

15

Freq

uenc

ies

Time


MATRIX FACTORIZATION▸ Let . We aims to factorize , with and X ∈ ℝMN X ≃ WH W ∈ ℝMK H ∈ ℝKN

16

≃

Data: X Dictionary: W Activation: H


MATRIX FACTORIZATION▸ Let . We aims to factorize , with and X ∈ ℝMN X ≃ WH W ∈ ℝMK H ∈ ℝKN

17

≃

Data: X Dictionary: W Activation: H

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0


MATRIX FACTORIZATION: EXAMPLES

▸ PCA : , being the eigen vectors (and is orthogonal) and the eigen values

▸ SVD: , being the singular values, and are orthogonal

▸ ICA

▸ etc.

X ∈ ℝNN = UDUT U D

X ∈ ℝMN = UDVT D U V

18


NON NEGATIVE MATRIX FACTORIZATION

▸ a matrix with non negative entries

▸ with and

▸ is a matrix of non negative features (exemples: spectrum)

▸ is a mature of non negative coefficients

▸ The Features can only add each other (no soustraction)

▸ Facilitate the interpretability

X ∈ ℝMN+

X = WH W ∈ ℝMK+ H ∈ ℝKN

+

W

H

19


NMF EXAMPLES

▸ Example from [Lee & Seung 1999]

▸ 49 images from MIT’s CBCL data set

20

49 images among 2429 from MIT’s CBCL face dataset

14


NMF EXAMPLES


▸ PCA decomposition

▸ Red values = negative pixel

21

PCA dictionary with K = 25

red pixels indicate negative values

15


NMF EXAMPLES


▸ NMF decomposition

22

NMF dictionary with K = 25

experiment reproduced from (Lee and Seung, 1999)

16


NMF EXAMPLE

▸ From [Smaragdis 2013]

▸ Spectral unmixing

23NMF for audio spectral unmixing(Smaragdis and Brown, 2003)

11

Non-Negative Matrix Factorization

! All factors are positive-valued: ! Resulting reconstruction is additive

A B O U T T H I S N O N - N E G A T I V E B U S I N E S S

1234C

ompo

nent

Frequency

1

2

3

4

Com

pone

nt

Freq

uenc

y (H

z)

Time (sec)

Input music passage

0.5 1 1.5 2 2.5 3100

500

1000

2000

3500

6000

1600020000

X ≈W ⋅HNon-negative

reproduced from (Smaragdis, 2013)

19


NMF EXAMPLES

▸ Piano example from [Fevotte 2009]

▸ Demo available at: https://www.irit.fr/~Cedric.Fevotte/extras/machine_audition/

24Piano toy example

!!!! !!!! !!!! !!!!" ##### $(MIDI numbers : 61, 65, 68, 72)

Figure: Three representations of data.

demo available at https://www.irit.fr/~Cedric.Fevotte/extras/machine_audition/ 38


NMF EXAMPLES

▸ Piano example from [Fevotte 2009]

▸ Demo available at: https://www.irit.fr/~Cedric.Fevotte/extras/machine_audition/

25Piano toy exampleIS-NMF on power spectrogram with K = 8

−10−8−6−4−2

K =

1

Dictionary W

0

5000

10000

15000Coefficients H

−0.20

0.2

Reconstructed components

−10−8−6−4−2

K =

2

0

5000

10000

−0.20

0.2

−10−8−6−4−2

K =

3

0

2000

4000

6000

−0.20

0.2

−10−8−6−4−2

K =

40

2000400060008000

−0.20

0.2

−10−8−6−4−2

K =

5

0

1000

2000

−0.20

0.2

−10−8−6−4−2

K =

6

0

100

200

−0.20

0.2

−10−8−6−4−2

K =

7

0

2

4

−0.20

0.2

50 100 150 200 250 300 350 400 450 500−10−8−6−4−2

K =

8

0 100 200 300 400 500 6000

1

2

0.5 1 1.5 2 2.5 3x 105

−0.20

0.2

Pitch estimates: 65.0 68.0 61.0 72.0 0 0 0 0

(True values: 61, 65, 68, 72)39


NMF DECOMPOSITIONS▸ Choose a measure of fit between and :

‣ , where is a scalar cost function

‣ Exemple of cost functions :

‣ Euclidian : [Paatero & Tapper, 1994] [Lee & Seung, 2001]

‣ Kullback-Leibler divergence: [Lee & Seung, 1999] [Finesso & Spreij, 2006]

‣ Itakura-Saito divergence: (Févotte, Bertin & Durrieu, 2009]

‣ Optimisation by multiplicative update (keep the non negativity)

X WH

W, H = argminW,H≥0

D(X |WH)

D(X |WH) = ∑n,k

d(X[n, k] |{WH}[n, k]) d

D(X |WH) = ∥X − WH∥2

d(x |y) = x logxy

+ (x − y)

d(x |y) =xy

− logxy

− 1

26


CONCLUSION

▸ Dictionary can be learned from the data

▸ Several model can be chosen (Sparse Coding, Convolutive Sparse Coding, NMF)

▸ One can "unroll" an algorithm to obtain a Neural Network

27

DICTIONARY LEARNING SPARSE...

Documents

Transcript of DICTIONARY LEARNING SPARSE...