Matrix Factorization

Post on 05-Jan-2016

36 views 1 download

description

Matrix Factorization. Recovering latent factors in a matrix. m movies. n users. V[ i,j ] = user i’s rating of movie j. Recovering latent factors in a matrix. m movies. m movies. ~. n users. V[ i,j ] = user i’s rating of movie j. KDD 2011. talk pilfered from  …. - PowerPoint PPT Presentation

Transcript of Matrix Factorization

1

Matrix Factorization

2

Recovering latent factors in a matrixm columns

v11 …

… …vij

… vnmn r

ow

s

3

Recovering latent factors in a matrixK * m

n *

K

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

4

What is this for?K * m

n *

K

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

5

MF for collaborative filtering

What is collaborative filtering?

What is collaborative filtering?

What is collaborative filtering?

What is collaborative filtering?

11

Recovering latent factors in a matrixm movies

v11 …

… …vij

… vnmV[i,j] = user i’s rating of movie j

n u

sers

12

Recovering latent factors in a matrixm movies

n u

sers

m movies

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

V[i,j] = user i’s rating of movie j

13

14

MF for image modeling

15

MF for images10,000 pixels

100

0 im

ag

es

1000 * 10,000,00

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 … … …… …

vij

… vnm

~

V[i,j] = pixel j in image i

2 prototypes

PC1

PC2

17

MF for modeling text

• The Neatest Little Guide to Stock Market Investing• Investing For Dummies, 4th Edition• The Little Book of Common Sense Investing: The Only

Way to Guarantee Your Fair Share of Stock Market Returns

• The Little Book of Value Investing• Value Investing: From Graham to Buffett and Beyond• Rich Dad’s Guide to Investing: What the Rich Invest in,

That the Poor and the Middle Class Do Not!• Investing in Real Estate, 5th Edition• Stock Investing For Dummies• Rich Dad’s Advisors: The ABC’s of Real Estate

Investing: The Secrets of Finding Hidden Profits Most Investors Miss

https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/

https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/

TFIDF counts would be better

20

Recovering latent factors in a matrixm terms

n d

ocu

men

ts

doc term matrix

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

V[i,j] = TFIDF score of term j in doc i

=

Investing for real estate

Rich Dad’s Advisor’s:

The ABCs of Real Estate Investment

The little book of common

sense investing: …

Neatest Little Guide

to Stock Market

Investing

24

MF is like clustering

k-means as MFcluster means

n e

xam

ple

s

0 11 0.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

original data setindicators

for r clusters

Z

M

X

26

How do you do it?K * m

n *

K

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

27

talk pilfered from …..

KDD 2011

28

29

Recovering latent factors in a matrixm movies

n u

sers

m movies

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

V[i,j] = user i’s rating of movie j

r

W

H

V

30

31

32

34

Matrix factorization as SGD

step size why does this work?

35

Matrix factorization as SGD - why does this work? Here’s the key claim:

36

Checking the claim

Think for SGD for logistic regression• LR loss = compare y and ŷ = dot(w,x)• similar but now update w (user weights) and x

(movie weight)

37

What loss functions are possible?N1, N2 - diagonal matrixes, sort of like IDF factors for the users/movies

“generalized” KL-divergence

38

What loss functions are possible?

39

What loss functions are possible?

40

ALS = alternating least squares

41

talk pilfered from …..

KDD 2011

42

43

44

45

Similar to McDonnell et al with perceptron learning

46

Slow convergence…..

47

48

49

50

51

52

53

More detail….• Randomly permute rows/cols of matrix• Chop V,W,H into blocks of size d x d

– m/d blocks in W, n/d blocks in H• Group the data:

– Pick a set of blocks with no overlapping rows or columns (a stratum)– Repeat until all blocks in V are covered

• Train the SGD– Process strata in series– Process blocks within a stratum in parallel

54

More detail….Z was V

55

More detail….• Initialize W,H randomly

– not at zero • Choose a random ordering (random sort) of the points in a stratum in each “sub-epoch”• Pick strata sequence by permuting rows and columns of M, and using M’[k,i] as column index of row i in subepoch k • Use “bold driver” to set step size:

– increase step size when loss decreases (in an epoch)– decrease step size when loss increases

• Implemented in Hadoop and R/Snowfall

M=

56

57

Wall Clock Time8 nodes, 64 cores, R/snow

58

59

60

61

62

Number of Epochs

63

64

65

66

67

Varying rank100 epochs for all

68

Hadoop scalabilityHadoop

process setup time starts to

dominate

69

Hadoop scalability

70