Matrix Factorization

69
Matrix Factorization 1

description

Matrix Factorization. Recovering latent factors in a matrix. m movies. n users. V[ i,j ] = user i’s rating of movie j. Recovering latent factors in a matrix. m movies. m movies. ~. n users. V[ i,j ] = user i’s rating of movie j. KDD 2011. talk pilfered from  …. - PowerPoint PPT Presentation

Transcript of Matrix Factorization

Page 1: Matrix Factorization

1

Matrix Factorization

Page 2: Matrix Factorization

2

Recovering latent factors in a matrixm columns

v11 …

… …vij

… vnmn r

ow

s

Page 3: Matrix Factorization

3

Recovering latent factors in a matrixK * m

n *

K

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

Page 4: Matrix Factorization

4

What is this for?K * m

n *

K

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

Page 5: Matrix Factorization

5

MF for collaborative filtering

Page 6: Matrix Factorization

What is collaborative filtering?

Page 7: Matrix Factorization

What is collaborative filtering?

Page 8: Matrix Factorization

What is collaborative filtering?

Page 9: Matrix Factorization

What is collaborative filtering?

Page 10: Matrix Factorization
Page 11: Matrix Factorization

11

Recovering latent factors in a matrixm movies

v11 …

… …vij

… vnmV[i,j] = user i’s rating of movie j

n u

sers

Page 12: Matrix Factorization

12

Recovering latent factors in a matrixm movies

n u

sers

m movies

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

V[i,j] = user i’s rating of movie j

Page 13: Matrix Factorization

13

Page 14: Matrix Factorization

14

MF for image modeling

Page 15: Matrix Factorization

15

Page 16: Matrix Factorization

MF for images10,000 pixels

100

0 im

ag

es

1000 * 10,000,00

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 … … …… …

vij

… vnm

~

V[i,j] = pixel j in image i

2 prototypes

PC1

PC2

Page 17: Matrix Factorization

17

MF for modeling text

Page 18: Matrix Factorization

• The Neatest Little Guide to Stock Market Investing• Investing For Dummies, 4th Edition• The Little Book of Common Sense Investing: The Only

Way to Guarantee Your Fair Share of Stock Market Returns

• The Little Book of Value Investing• Value Investing: From Graham to Buffett and Beyond• Rich Dad’s Guide to Investing: What the Rich Invest in,

That the Poor and the Middle Class Do Not!• Investing in Real Estate, 5th Edition• Stock Investing For Dummies• Rich Dad’s Advisors: The ABC’s of Real Estate

Investing: The Secrets of Finding Hidden Profits Most Investors Miss

https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/

Page 19: Matrix Factorization

https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/

TFIDF counts would be better

Page 20: Matrix Factorization

20

Recovering latent factors in a matrixm terms

n d

ocu

men

ts

doc term matrix

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

V[i,j] = TFIDF score of term j in doc i

Page 21: Matrix Factorization

=

Page 22: Matrix Factorization

Investing for real estate

Rich Dad’s Advisor’s:

The ABCs of Real Estate Investment

Page 23: Matrix Factorization

The little book of common

sense investing: …

Neatest Little Guide

to Stock Market

Investing

Page 24: Matrix Factorization

24

MF is like clustering

Page 25: Matrix Factorization

k-means as MFcluster means

n e

xam

ple

s

0 11 0.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

original data setindicators

for r clusters

Z

M

X

Page 26: Matrix Factorization

26

How do you do it?K * m

n *

K

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

Page 27: Matrix Factorization

27

talk pilfered from …..

KDD 2011

Page 28: Matrix Factorization

28

Page 29: Matrix Factorization

29

Recovering latent factors in a matrixm movies

n u

sers

m movies

x1 y1x2 y2.. ..

… …xn yn

a1 a2 .. … amb1 b2 … … bm v11 …

… …vij

… vnm

~

V[i,j] = user i’s rating of movie j

r

W

H

V

Page 30: Matrix Factorization

30

Page 31: Matrix Factorization

31

Page 32: Matrix Factorization

32

Page 33: Matrix Factorization

34

Matrix factorization as SGD

step size why does this work?

Page 34: Matrix Factorization

35

Matrix factorization as SGD - why does this work? Here’s the key claim:

Page 35: Matrix Factorization

36

Checking the claim

Think for SGD for logistic regression• LR loss = compare y and ŷ = dot(w,x)• similar but now update w (user weights) and x

(movie weight)

Page 36: Matrix Factorization

37

What loss functions are possible?N1, N2 - diagonal matrixes, sort of like IDF factors for the users/movies

“generalized” KL-divergence

Page 37: Matrix Factorization

38

What loss functions are possible?

Page 38: Matrix Factorization

39

What loss functions are possible?

Page 39: Matrix Factorization

40

ALS = alternating least squares

Page 40: Matrix Factorization

41

talk pilfered from …..

KDD 2011

Page 41: Matrix Factorization

42

Page 42: Matrix Factorization

43

Page 43: Matrix Factorization

44

Page 44: Matrix Factorization

45

Similar to McDonnell et al with perceptron learning

Page 45: Matrix Factorization

46

Slow convergence…..

Page 46: Matrix Factorization

47

Page 47: Matrix Factorization

48

Page 48: Matrix Factorization

49

Page 49: Matrix Factorization

50

Page 50: Matrix Factorization

51

Page 51: Matrix Factorization

52

Page 52: Matrix Factorization

53

More detail….• Randomly permute rows/cols of matrix• Chop V,W,H into blocks of size d x d

– m/d blocks in W, n/d blocks in H• Group the data:

– Pick a set of blocks with no overlapping rows or columns (a stratum)– Repeat until all blocks in V are covered

• Train the SGD– Process strata in series– Process blocks within a stratum in parallel

Page 53: Matrix Factorization

54

More detail….Z was V

Page 54: Matrix Factorization

55

More detail….• Initialize W,H randomly

– not at zero • Choose a random ordering (random sort) of the points in a stratum in each “sub-epoch”• Pick strata sequence by permuting rows and columns of M, and using M’[k,i] as column index of row i in subepoch k • Use “bold driver” to set step size:

– increase step size when loss decreases (in an epoch)– decrease step size when loss increases

• Implemented in Hadoop and R/Snowfall

M=

Page 55: Matrix Factorization

56

Page 56: Matrix Factorization

57

Wall Clock Time8 nodes, 64 cores, R/snow

Page 57: Matrix Factorization

58

Page 58: Matrix Factorization

59

Page 59: Matrix Factorization

60

Page 60: Matrix Factorization

61

Page 61: Matrix Factorization

62

Number of Epochs

Page 62: Matrix Factorization

63

Page 63: Matrix Factorization

64

Page 64: Matrix Factorization

65

Page 65: Matrix Factorization

66

Page 66: Matrix Factorization

67

Varying rank100 epochs for all

Page 67: Matrix Factorization

68

Hadoop scalabilityHadoop

process setup time starts to

dominate

Page 68: Matrix Factorization

69

Hadoop scalability

Page 69: Matrix Factorization

70