Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP...

53
Statistical Performance of Convex Tensor Decomposition Ryota Tomioka 2012/01/26 @ Kyoto University Perspectives in Informatics 4B Collaborators: Taiji SuzukiKohei HayashiHisashi Kashima Slides available: h-p://www.ibis.t.u‐tokyo.ac.jp/ryotat/tensor12kyoto.pdf

Transcript of Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP...

Page 1: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Statistical Performance of Convex Tensor Decomposition

Ryota Tomioka 

2012/01/26 @ Kyoto University 

Perspectives in Informatics 4B 

Collaborators: Taiji Suzuki,Kohei Hayashi,Hisashi Kashima 

Slides available: h-p://www.ibis.t.u‐tokyo.ac.jp/ryotat/tensor12kyoto.pdf

Page 2: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Netflix challenge (2006-2009) •  $1,000,000  prize •  Goal: Improve the performance of a video 

recommendaFon system 

(predict who likes which movies) 

•  Example: 

Likes “Star Wars” and “E.T.”, 

Doesn’t like “Minority Report”.  

Does he like “Blade Runner”?

Page 3: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Matrix completion view

+1 +1 ‐1 ? ?

+1 ? ? +1 ?

? +1 ‐1 ? +1

+1 ? ? ? +1

. . . . .

. . . . .

User A User B User C User D

Goal: fill the missing entries!

Page 4: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Matrix completion •  Impossible without an assumpFon. (Missing 

entries can be arbitrary) ‐‐‐ problem is ill‐posed 

•  Most common assumpFon: 

Low‐rank decomposiFon 

Y ≒ U VT × Users

Movies

Users’ features Movies’ features

Page 5: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Matrix completion •  Most common assumpFon: 

Low‐rank decomposiFon (rank r) 

≒ × Users

Movies

Users’ features Movies’ features

ui vj yij

yij = ui!vj

dot prodcut in r‐dim space

! "

Page 6: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Geometric Intuition r‐dimensional space 

(r: the rank of the decomposiFon)

ua

Movies he likes Movies he 

doesn’t like

U

VT

Page 7: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Geometric Intuition r‐dimensional space 

(r: the rank of the decomposiFon)

U

VT

ua ub

Page 8: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Tensor data completion •  Tensor = MulF‐dimensional array 

•  Beyond 2D 

Users

Movies

Movie preference  + Fme / context / acFon 

Page 9: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Tensor data completion •  Tensor = MulF‐dimensional array 

•  Beyond 2D 

Sens

ors

Location

Climate monitoring  ‐ temperature  ‐ humidity  ‐ rainfall 

Page 10: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Tensor data completion •  Tensor = MulF‐dimensional array 

•  Beyond 2D 

Sens

ors

Time

Neuroscience (brain imaging)  Se

nsors 

Time 

Page 11: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Rest of this talk •  CompuFng low‐rank matrix decomposiFon 

•  Generalizing from matrix to tensor 

•  Analyzing the performance 

– StaFsFcal learning theory

Page 12: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

CompuFng low‐rank  matrix decomposiFon

Page 13: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Computing low-rank decomposition •  If all entries are observed (no missing entries) 

– Given Y, compute singular value decomposiFon (SVD)

Y U VT ≒ Σ m

n

m

r

r r

r n

where U, V: Orthogonal (UTU=I, VTV=I) 

! =

!!1

. . .!r

"σj: jth largest singular value

Page 14: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Tolerating missings

U m r

V n r

Set of observed index pairs

OpFmizaFon problem 

minimizeU ,V

!

(ij)!!(yij " ui

#vj)2

Users’ features

Movies’ features

Page 15: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Tolerating missings OpFmizaFon problem 

(‐1,‐1)

(1,1)

Non‐convex!

u

v

minimizeU ,V

!

(ij)!!(yij " ui

#vj)2

U m r

V n r

Users’ features

Movies’ features

Page 16: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Tolerating missings OpFmizaFon problem  SFll non‐convex!

Rank constraint 

is NP hard

minimizeW

!

(ij)!!(yij " wij)

2,

subject to rank(W ) # r

Page 17: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Convex relaxation of rank Scha-en p‐norm (to the pth power) 

!! !" !# $ # " !$

#

"

!

%

&

&

'('$)$#

'('$)*

'('

("

p=1 is the Fghtest 

convex relaxaFon 

(also known as 

trace norm / 

nuclear norm)

!j(W ) : jth largest singular value

!W !pSp

:=!r

j=1!p

j (W )

!W !pSp

p"0###" rank(W )

Page 18: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Tolerating missings

Cf. Lasso (L1 norm) for variable selecFon 

 = linear sum of abs. coefficients

OpFmizaFon problem  Convex relaxaFon minimize

W

!

(ij)!!

(yij " wij)2,

subject to #W #S1$ !

Scha-en 1‐norm (nuclear norm, 

trace norm)

!W !S1=

!r

j=1!j(W )

!j(W ) : jth largest singular value

Page 19: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Take home messages •  Rank constrained minimizaFon is hard to solve 

(non‐convex and NP hard) 

•  Can be relaxed into a tractable convex problem 

using Scha-en 1‐norm. 

Page 20: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

How about tensors? 

 ‐ How to define tensor rank? 

 ‐ How related to matrix rank?

Page 21: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Ranf of a tensor DefiniFon. Let 

where

X ! Rn1!···!nK (Kth order tensor)

is rank one.

(can be wri-en as an outer product of K vectors)

The smallest number R such that the given tensor X is wri-en as

•  Called CP (CANDECOMPO/PARAFAC) decomposiFon 

•  Bad news: NP hard to compute the rank R even for 

a fully observed X. 

X =R!

r=1

Ar Ar =

Page 22: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Bad news 2: Tensor rank is not closed X is rank 3

Y is rank 2

!X " Y!F # 0 (!#$)

Kolda & Bader 2009

X = a1 ! b1 ! c2

+ a1 ! b2 ! c1

+ a2 ! b1 ! c1

Y = !(a1 +1

!a2) ! (b1 +

1

!b2) ! (c1 +

1

!c2)

" !a1 ! b1 ! c1

Page 23: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Tucker decomposition [Tucker 66]

•  Also known as higher‐order SVD [De Lathauwer+00] •  Rank (r1,r2,r3) can be computed in polynomial Fme 

using unfolding operaFons. 

n1 n2

n3

!Xijk =

r1"

a=1

r2"

b=1

r3"

c=1

CabcU(1)ia U (2)

jb U (3)kc

#r1

r2

r3 = !1 !2 !3

n1 r1

n2 r2

n3 r3

Core Factors

Page 24: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Mode-k unfoldings (matricization)

Page 25: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Computing Tucker rank •  For each k=1,…,K – Compute the mode‐k unfolding X(k) 

– Compute the (matrix) rank of X(k) rk = rank(X(k))

Tensor X is low‐rank in the kth mode

Matrix X(k) is low‐rank 

(as a matrix) 

Unfolding

Folding

Page 26: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Computing Tucker rank •  For each k=1,…,K –  Compute the mode‐k unfolding X(k) 

–  Compute the (matrix) rank of X(k) 

•  Difference between Tensor rank and Tucker rank –  Tensor rank is a single number R (may not be easy to compute) 

–  Tucker rank is defined for each mode (easy to compute) 

•  CP decomp is a special case of 

    Tucker decomp with 

    R=r1=r2=…=rK and diagonal core 

rk = rank(X(k))

C = R

R R

Page 27: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Basic idea •  We know how to do matrix compleFon with 

Scha-en 1‐norm (tractable convex opFmizaFon) 

•  We know how to compute Tucker rank (=the rank 

of the mode‐k unfolding) 

Scha-en 1‐norm

Unfolding + = Convex, tractable 

tensor compleFon

Page 28: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Overlapped Schatten 1-norm for Tensors

Schatten 1-norm of the mode-k unfolding

!!!!!!W!!!!!!

S1:=

1K

K"

k=1

!W (k)!S1

Measures the overall low‐rank‐ness 

(not just a single mode)

Page 29: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Convex Tensor Estimation Matrix

Tensor

EsFmaFon of low‐rank matrix 

(hard) Convex 

relaxaFon

EsFmaFon of low‐rank tensor 

(hard)

Scha-en 1‐norm minimizaFon (tractable) 

[Fazel, Hindi, Boyd 01] 

Overlapped Scha-en 1‐norm minimizaFon 

[Liu+09, Signoretto+10, Tomioka+10, Gandy+11] 

Convex relaxaFon

Generalize

29

Page 30: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Empirical performance

! !"# !"$ !"% !"& !"' !"( !") !"* !"+ #

#!!%

#!!

,-./012342542678-98:48;8<8307

=--2-4>>?@!?A>> ,

71B8C'!D'!D$!E4-.3FC)D*D+

4

4

G2398D

=H4I323/2398DJ

KL01<1B.0123402;8-.3/8

M/(n1n2n3)

(No noise)

30

Tensor compleFon result [Tomioka et al. 2010]

Phase transition!! Can we predict this theoretically?

Page 31: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Analyzing the performance of convex tensor decomposiFon

Page 32: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Problem setting ObservaFon model

yi = !X i, W!" + !i (i = 1, . . . , M)W! true tensor rank‐(r1,...,rK)

Example (tensor compleFon)

X1 = n1 n2

n3

!i Gaussian noise

Page 33: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Problem setting ObservaFon model

yi = !X i, W!" + !i (i = 1, . . . , M)W! true tensor rank‐(r1,...,rK)

Example (tensor compleFon)

X2 = n1 n2

n3

!i Gaussian noise

Page 34: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Problem setting ObservaFon model

yi = !X i, W!" + !i (i = 1, . . . , M)W! true tensor rank‐(r1,...,rK)

Example (tensor compleFon)

X3 = n1 n2

n3

!i Gaussian noise

Page 35: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Problem setting ObservaFon model

yi = !X i, W!" + !i (i = 1, . . . , M)W! true tensor rank‐(r1,...,rK)

Example (tensor compleFon)

X4 = n1 n2

n3

!i Gaussian noise

and so on…

Page 36: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Problem setting

OpFmizaFon

Reg. Const. Observation model X(W) = (!X 1, W" , . . . , !X M , W")!

Empirical error Regularization

(N =!K

k=1 nk)X : RN ! RM

W = argminW!Rn1!···!nK

! 12M

!y " X(W)!22 + !M

""""""W""""""

S1

#

ObservaFon model yi = !X i, W!" + !i (i = 1, . . . , M)

W! true tensor rank‐(r1,...,rK) !i Gaussian noise N(0,σ2)

Page 37: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Analysis objective •  We would like to show something like

EsFmated tensor

True low‐rank tensor

Mean 

squared 

error

n = (n1, . . . , nK)

r = (r1, . . . , rK)

M

The size 

The rank 

Number of samples

!!!!!!!!!W !W"

!!!!!!!!!2

F

N# Op

"c(n, r)

M

#

Page 38: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Theorem: random Gauss design  Assume elements of Xi are drown iid from standard normal distribuFon.Moreover 

#samples (M)

#variables (N)! c1"n!1"1/2"r"1/2

!n!1!1/2 :=!

1K

"Kk=1

#1/nk

$2, !r!1/2 :=

!1K

"Kk=1

"rk

$2

!r

nNormalized rank

Page 39: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Theorem: random Gauss design

!n!1!1/2 :=!

1K

"Kk=1

#1/nk

$2, !r!1/2 :=

!1K

"Kk=1

"rk

$2

 Assume elements of Xi are drown iid from standard normal distribuFon.Moreover 

!!!!!!W !W!!!!!!!2F

N" Op

"!2#n"1#1/2#r#1/2

M

#Convergence!

Normalized rank

#samples (M)

#variables (N)! c1"n!1"1/2"r"1/2 !

r

n

Page 40: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Proof idea

What we want to derive: !!!!!!!!!W !W"

!!!!!!!!!2

F

N# Op

"c(n, r)

M

#

EsFmated tensor

True low‐rank tensor

1

2M!X(W "W#)!22 $

!X#(!)/M, W "W#

"+ !M

#########W "W#

#########S1

It is not so hard to see: X!(!) =M!

i=1

!iXi

Since W minimizes the objecFve,

Obj(W) ! Obj(W!)

Page 41: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Proof outline (1/3)

Inequality 1: upper‐bound the dot product !X!(!)/M, W "W!

"# Op

#$!2N$n"1$1/2

M

%%%%%%%%%W "W!

%%%%%%%%%S1

&

1

2M!X(W "W#)!22 $

!X#(!)/M, W "W#

"+ !M

#########W "W#

#########S1

EsFmated tensor

True low‐rank tensor

(opFmizaFon duality / random matrix theory)

Page 42: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Proof outline (1/3)

Inequality 1: upper‐bound the dot product !X!(!)/M, W "W!

"# Op

#$!2N$n"1$1/2

M

%%%%%%%%%W "W!

%%%%%%%%%S1

&

EsFmated tensor

True low‐rank tensor

!M ! Op

!""2N"n#1"1/2

M

#

OpFmal reg. const

1

2M!X(W "W#)!22 $

!"!2N!n"1!1/2

M + "M

# $$$$$$$$$W "W#

$$$$$$$$$S1

Trade‐off between                     and 

Page 43: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Proof outline (2/3)

Inequality 2: relate the scha-en 1‐norm with the Frobenius norm

1

2M!X(W "W#)!22 $

!!2N!n"1!1/2

M

"""""""""W "W#

"""""""""S1

EsFmated tensor

True low‐rank tensor

!!!!!!!!!W !W"

!!!!!!!!!S1#

"$r$1/2

!!!!!!!!!W !W"

!!!!!!!!!F

(relaFon between L1‐ and L2‐norm)

Page 44: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Proof outline (2/3)

Inequality 2: relate the scha-en 1‐norm with the Frobenius norm

EsFmated tensor

True low‐rank tensor

!!!!!!!!!W !W"

!!!!!!!!!S1#

"$r$1/2

!!!!!!!!!W !W"

!!!!!!!!!F

1

2M!X(W "W#)!22 $

!!2N!n"1!1/2!r!1/2

M

"""""""""W "W#

"""""""""F

(relaFon between L1‐ and L2‐norm)

Page 45: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Proof outline (3/3)

Inequality 3: lower‐bound the les hand‐side !

!!!!!!!!!W !W"

!!!!!!!!!2

F#

1

M$X(W !W")$22

EsFmated tensor

True low‐rank tensor

1

2M!X(W "W#)!22 $

!!2N!n"1!1/2!r!1/2

M

"""""""""W "W#

"""""""""F

If #samples (M)

#variables (N)! c1"n!1"1/2"r"1/2

(Gordon‐Slepian Theorem in Gaussian process theory)

Page 46: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Proof outline (3/3)

Inequality 3: lower‐bound the les hand‐side !

!!!!!!!!!W !W"

!!!!!!!!!2

F#

1

M$X(W !W")$22

EsFmated tensor

True low‐rank tensor

If #samples (M)

#variables (N)! c1"n!1"1/2"r"1/2

!!!!!!!!!!W !W"

!!!!!!!!!2

F#

""2N$n!1$1/2$r$1/2

M

!!!!!!!!!W !W"

!!!!!!!!!F

(Gordon‐Slepian Theorem in Gaussian process theory)

Page 47: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Back to the theorem statement

!n!1!1/2 :=!

1K

"Kk=1

#1/nk

$2, !r!1/2 :=

!1K

"Kk=1

"rk

$2

 Assume elements of Xi are drown iid from standard normal distribuFon.Moreover 

!!!!!!W !W!!!!!!!2F

N" Op

"!2#n"1#1/2#r#1/2

M

#Convergence!

Normalized rank

#samples (M)

#variables (N)! c1"n!1"1/2"r"1/2 !

r

n

NoFce: 

•  Sample‐size condiFon independent of noise σ2. 

•  Bound RHS proporFonal to σ2. Threshold behavior in the limit σ2  0

Page 48: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

! !"# !"$ !"% !"& '

'!!(

'!!

)*+,-./01/21/345*6571585950-4

:4-.9+-./015**/*

1

1

;/065<1=>1&1?@

;/65<1=$!1?1>@

AB-.9.C+-./01-/85*+0,5

Tensor completion results

! !"# !"$ !"% !"&!

!"#

!"$

!"%

!"&

'

()*+,-./012*,342553!'55'6#55*55

'6#

7*,89.)32,92:**)*;<!"!'

2

2

=./0<>?!2?!2#!@

=./0<>'!!2'!!2?!@

No observation noise Normalized rank

Frac

tion M/N

at

error<=

0.01

rank=[7,8,9] 0.01

size = 50x50x20 true rank 7x8x9 or 40x9x7

rank=[40,9,7] #samples (M)

#variables (N)

Page 49: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Including 4th order tensors

! !"# !"$ !"% !"& '!

!"#

!"$

!"%

!"&

'

()*+,-./012*,342553!'55'6#55*55

'6#

7*,89.)32,920**:;!"!'

2

2

<./0;=>!2>!2#!?

<./0;='!!2'!!2>!?

<./0;=>!2>!2#!2'!?

<./0;='!!2'!!2#!2'!?

Gap between 

necessity & 

sufficiency?

Page 50: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

! !"# !"$ !"% !"& '!

!"#

!"$

!"%

!"&

'

()*+,-./012*,342553!'55'6#55*55

'6#

7*,89.)32,920**:;!"!'

2

2

<./0;=>!2>!2#!?

<./0;='!!2'!!2>!?

<./0;=>!2>!2#!2'!?

<./0;='!!2'!!2#!2'!?

! !"# !"$ !"% !"& !"' !"(!

!"$

!"&

!"(

!")

#

*+,-./01234,.564775!#77#8$77,77

#8$

9,.:;0+54.;42,,<=!"!#

4

4

>012=?'!4$!@

>012=?#!!4&!@

>012=?$'!4$!!@

Matrix / tensor completion

Matrix completion Tensor completion

Frac

tion M/N

at e

rror<=

0.01

Tensor completion easier than matrix completion!?

K = 2 K ≧ 3

Page 51: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Conclusion •  Many real world problems can be cast into the 

form of tensor data analysis. 

•  Convex opFmizaFon is a useful tool also for the 

analysis of higher order tensors. 

•  Proposed a convex tensor decomposiFon 

algorithm with performance guarantee 

•  Normalized rank predicts empirical scaling 

behavior well  

Page 52: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

Issues •  Why matrix compleFon more difficult than tensor compleFon? 

•  How big the gap between necessity and sufficiency? 

•  Random Gaussian design ≠ tensor compleFon ⇒ Incoherence (Candes & Recht 09)  

⇒ Spikiness (Negahban et al 10) 

•  When only some modes are low‐rank –  Scha-en 1‐norm is too strong ⇒ Mixture approach 

–  E.g. Mode 1, 4 is low rank but the rest is not (combinatorial problem) 

Page 53: Statistical Performance of Convex Tensor Decompositionryotat/talks/tensor12kyoto.pdf · • CP decomp is a special case of ... 71B8C’!D’!D$!E4-.3FC)D*D+ 4 4 G2398D =H4I323/2398DJ

T h n a k y u o !