Matrix Factorization
description
Transcript of Matrix Factorization
1
Matrix Factorization
2
Recovering latent factors in a matrixm columns
v11 …
… …vij
… vnmn r
ow
s
3
Recovering latent factors in a matrixK * m
n *
K
x1 y1x2 y2.. ..
… …xn yn
a1 a2 .. … amb1 b2 … … bm v11 …
… …vij
… vnm
~
4
What is this for?K * m
n *
K
x1 y1x2 y2.. ..
… …xn yn
a1 a2 .. … amb1 b2 … … bm v11 …
… …vij
… vnm
~
5
MF for collaborative filtering
What is collaborative filtering?
What is collaborative filtering?
What is collaborative filtering?
What is collaborative filtering?
11
Recovering latent factors in a matrixm movies
v11 …
… …vij
… vnmV[i,j] = user i’s rating of movie j
n u
sers
12
Recovering latent factors in a matrixm movies
n u
sers
m movies
x1 y1x2 y2.. ..
… …xn yn
a1 a2 .. … amb1 b2 … … bm v11 …
… …vij
… vnm
~
V[i,j] = user i’s rating of movie j
13
14
MF for image modeling
15
MF for images10,000 pixels
100
0 im
ag
es
1000 * 10,000,00
x1 y1x2 y2.. ..
… …xn yn
a1 a2 .. … amb1 b2 … … bm v11 … … …… …
vij
… vnm
~
V[i,j] = pixel j in image i
2 prototypes
PC1
PC2
17
MF for modeling text
• The Neatest Little Guide to Stock Market Investing• Investing For Dummies, 4th Edition• The Little Book of Common Sense Investing: The Only
Way to Guarantee Your Fair Share of Stock Market Returns
• The Little Book of Value Investing• Value Investing: From Graham to Buffett and Beyond• Rich Dad’s Guide to Investing: What the Rich Invest in,
That the Poor and the Middle Class Do Not!• Investing in Real Estate, 5th Edition• Stock Investing For Dummies• Rich Dad’s Advisors: The ABC’s of Real Estate
Investing: The Secrets of Finding Hidden Profits Most Investors Miss
https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/
https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/
TFIDF counts would be better
20
Recovering latent factors in a matrixm terms
n d
ocu
men
ts
doc term matrix
x1 y1x2 y2.. ..
… …xn yn
a1 a2 .. … amb1 b2 … … bm v11 …
… …vij
… vnm
~
V[i,j] = TFIDF score of term j in doc i
=
Investing for real estate
Rich Dad’s Advisor’s:
The ABCs of Real Estate Investment
…
The little book of common
sense investing: …
Neatest Little Guide
to Stock Market
Investing
24
MF is like clustering
k-means as MFcluster means
n e
xam
ple
s
0 11 0.. ..
… …xn yn
a1 a2 .. … amb1 b2 … … bm v11 …
… …vij
… vnm
~
original data setindicators
for r clusters
Z
M
X
26
How do you do it?K * m
n *
K
x1 y1x2 y2.. ..
… …xn yn
a1 a2 .. … amb1 b2 … … bm v11 …
… …vij
… vnm
~
27
talk pilfered from …..
KDD 2011
28
29
Recovering latent factors in a matrixm movies
n u
sers
m movies
x1 y1x2 y2.. ..
… …xn yn
a1 a2 .. … amb1 b2 … … bm v11 …
… …vij
… vnm
~
V[i,j] = user i’s rating of movie j
r
W
H
V
30
31
32
34
Matrix factorization as SGD
step size why does this work?
35
Matrix factorization as SGD - why does this work? Here’s the key claim:
36
Checking the claim
Think for SGD for logistic regression• LR loss = compare y and ŷ = dot(w,x)• similar but now update w (user weights) and x
(movie weight)
37
What loss functions are possible?N1, N2 - diagonal matrixes, sort of like IDF factors for the users/movies
“generalized” KL-divergence
38
What loss functions are possible?
39
What loss functions are possible?
40
ALS = alternating least squares
41
talk pilfered from …..
KDD 2011
42
43
44
45
Similar to McDonnell et al with perceptron learning
46
Slow convergence…..
47
48
49
50
51
52
53
More detail….• Randomly permute rows/cols of matrix• Chop V,W,H into blocks of size d x d
– m/d blocks in W, n/d blocks in H• Group the data:
– Pick a set of blocks with no overlapping rows or columns (a stratum)– Repeat until all blocks in V are covered
• Train the SGD– Process strata in series– Process blocks within a stratum in parallel
54
More detail….Z was V
55
More detail….• Initialize W,H randomly
– not at zero • Choose a random ordering (random sort) of the points in a stratum in each “sub-epoch”• Pick strata sequence by permuting rows and columns of M, and using M’[k,i] as column index of row i in subepoch k • Use “bold driver” to set step size:
– increase step size when loss decreases (in an epoch)– decrease step size when loss increases
• Implemented in Hadoop and R/Snowfall
M=
56
57
Wall Clock Time8 nodes, 64 cores, R/snow
58
59
60
61
62
Number of Epochs
63
64
65
66
67
Varying rank100 epochs for all
68
Hadoop scalabilityHadoop
process setup time starts to
dominate
69
Hadoop scalability
70