Matrix Factorization

1

Matrix Factorization

2

Recovering latent factors in a matrixm columns

v11 …

… …vij

… vnmn r

ow

s

3

Recovering latent factors in a matrixK * m

n *

K

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

4

What is this for?K * m

n *

K

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

5

MF for collaborative filtering

What is collaborative filtering?

11

Recovering latent factors in a matrixm movies

v11 …

… …vij

… vnmV[i,j] = user i’s rating of movie j

n u

sers

12


n u

sers

m movies

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

V[i,j] = user i’s rating of movie j

14

MF for image modeling

MF for images10,000 pixels

100

0 im

ag

es

1000 * 10,000,00

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 … … …… …

vij

… vnm

~

V[i,j] = pixel j in image i

2 prototypes

PC1

PC2

17

MF for modeling text

• The Neatest Little Guide to Stock Market Investing• Investing For Dummies, 4th Edition• The Little Book of Common Sense Investing: The Only

Way to Guarantee Your Fair Share of Stock Market Returns

• The Little Book of Value Investing• Value Investing: From Graham to Buffett and Beyond• Rich Dad’s Guide to Investing: What the Rich Invest in,

That the Poor and the Middle Class Do Not!• Investing in Real Estate, 5th Edition• Stock Investing For Dummies• Rich Dad’s Advisors: The ABC’s of Real Estate

Investing: The Secrets of Finding Hidden Profits Most Investors Miss

https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/

https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/

TFIDF counts would be better

20

Recovering latent factors in a matrixm terms

n d

ocu

men

ts

doc term matrix

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

V[i,j] = TFIDF score of term j in doc i

Investing for real estate

Rich Dad’s Advisor’s:

The ABCs of Real Estate Investment

…

The little book of common

sense investing: …

Neatest Little Guide

to Stock Market

Investing

24

MF is like clustering

k-means as MFcluster means

n e

xam

ple

s

0 11 0.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

original data setindicators

for r clusters

Z

M

X

26

How do you do it?K * m

n *

K

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

27

talk pilfered from …..

KDD 2011

29


n u

sers

m movies

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

V[i,j] = user i’s rating of movie j

r

W

H

V

34

Matrix factorization as SGD

step size why does this work?

35

Matrix factorization as SGD - why does this work? Here’s the key claim:

36

Checking the claim

Think for SGD for logistic regression• LR loss = compare y and ŷ = dot(w,x)• similar but now update w (user weights) and x

(movie weight)

37

What loss functions are possible?N1, N2 - diagonal matrixes, sort of like IDF factors for the users/movies

“generalized” KL-divergence

38

What loss functions are possible?

39

What loss functions are possible?

40

ALS = alternating least squares

41

talk pilfered from …..

KDD 2011

45

Similar to McDonnell et al with perceptron learning

46

Slow convergence…..

53

More detail….• Randomly permute rows/cols of matrix• Chop V,W,H into blocks of size d x d

– m/d blocks in W, n/d blocks in H• Group the data:

– Pick a set of blocks with no overlapping rows or columns (a stratum)– Repeat until all blocks in V are covered

• Train the SGD– Process strata in series– Process blocks within a stratum in parallel

54

More detail….Z was V

55

More detail….• Initialize W,H randomly

– not at zero • Choose a random ordering (random sort) of the points in a stratum in each “sub-epoch”• Pick strata sequence by permuting rows and columns of M, and using M’[k,i] as column index of row i in subepoch k • Use “bold driver” to set step size:

– increase step size when loss decreases (in an epoch)– decrease step size when loss increases

• Implemented in Hadoop and R/Snowfall

M=

57

Wall Clock Time8 nodes, 64 cores, R/snow

62

Number of Epochs

67

Varying rank100 epochs for all

68

Hadoop scalabilityHadoop

process setup time starts to

dominate

69

Hadoop scalability

Matrix Factorization

Documents

Transcript of Matrix Factorization