Bando de Dados Avançados - Recommender Systems
-
Upload
gustavo-coutinho -
Category
Science
-
view
241 -
download
1
Transcript of Bando de Dados Avançados - Recommender Systems
![Page 1: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/1.jpg)
Recommender Systems Collaborative Filtering & Dimensionality Reduction
Mining of Massive DatasetsJure Leskovec, Anand Rajaraman, Jeff UllmanStanford University*Adapted by Gustavo Coutinho
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. If you make use of a significant portion of these slides in your own lecture, please include this message, or a link to our web site: http://www.mmds.org
![Page 2: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/2.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Collaborative FilteringHarnessing quality judgments of other
users
2
![Page 3: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/3.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Previously - Content-Based
3J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 4: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/4.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Previously - Content-Based
4J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 5: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/5.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Previously - Content-Based
5J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 6: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/6.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Utility Matrix
Users have preferences for certain items, and these preferences must be teased out of the data. Lets represent it with an Utility Matrix! Example:
6
![Page 7: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/7.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Collaborative Filtering
Consider user x
Find set N of other users whose ratings are “similar” to x’s ratings
Estimate x’s ratings based on ratings of users in N
7
x
N
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 8: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/8.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Collaborative Filtering
Different from Content-Based Filtering
We don’t need to understand the
content of an specific item!
Different user share their experiences
8J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 9: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/9.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Let rx and rx vectors of users x and y ratings, respectively
Lets try to use the Jaccard Similarity as a measure
9
Finding “Similar” Users
rx = [*, _, _, *, ***] ry = [*, _, **, **, _]
![Page 10: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/10.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Now, rx and ry are considered as sets
Problem: Ignores the value of the rating!
10
Finding “Similar” Users
rx = { 1, 4, 5} ry = { 1, 3, 4}
![Page 11: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/11.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
How to put the rating factor under a formula? Cosine Similarity measure
Now, rx and ry are considered as points
Problem: Treats missing ratings as “negative”!
11
Finding “Similar” Users
similarity = cos(Θ) =rx · ry
||rx|| · ||ry||
rx = { 1, 0, 0, 1, 3} ry = { 1, 0, 2, 2, 0}
![Page 12: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/12.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
How do we balance the missing values? Pearson correlation coefficient Sxy= items rated by both users x and y
12
Finding “Similar” Users
sim(x, y) =
!
s∈Sxy(rxs − rx)(rys − ry)
"
!
s∈Sxy(rxs − rx)2
"
!
s∈Sxy(rys − ry)2
rx and ry = average rating of “x” and “y”
![Page 13: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/13.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Similarity Metric
Lets consider de following Utility Matrix of users and ratings
Intuitively we want: sim(A,B)>sim(A,C) Using Jaccard: 1/5 < 2/4 Using Cosine: 0.386 > 0.322
13
![Page 14: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/14.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Similarity Metric
Now, we’re going to use Pearson Correlation
Subtracting the (row) mean
Using Pearson: 0.092 > -0.559 Notice that Cosine Similarity is a correlation when data is centered at 0
14
![Page 15: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/15.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Rating Predictions
How can we go from similarity metrics to recommendations? Let rx be the vector of user x’s ratings Let N be the set of k users most similar to x who have rated item i Prediction for item s of user x:
Where sxy=sim(x,y)
15
rxi =
!y∈N sxy · ryi!
y∈N sxy
rxi =1
k·
!
y∈N
ryi
![Page 16: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/16.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Item-Item Collaborative Filtering
Until now we have used an User-User approach. What about an Item-Item? ▪ For item i, find other similar items ▪ Estimate rating for item i based on
ratings for similar items ▪ Can use the same similarity metrics and
predictions functions as in user-user model
16
![Page 17: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/17.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
12 11 10 9 8 7 6 5 4 3 2 1
4 5 5 3 1 1
3 1 2 4 4 5 2
5 3 4 3 2 1 4 2 3
2 4 5 4 2 4
5 2 2 4 3 4 5
4 2 3 3 1 6
Users
Mov
ies
- unknown rating - rating between 1 and 5
Item-Item CF (|N|=2)
17
![Page 18: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/18.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Item-Item CF (|N|=2)
12 11 10 9 8 7 6 5 4 3 2 1
4 5 5 ? 3 1 1
3 1 2 4 4 5 2
5 3 4 3 2 1 4 2 3
2 4 5 4 2 4
5 2 2 4 3 4 5
4 2 3 3 1 6
Users
Mov
ies
- estimate rating of movie 1 by user 5
18
![Page 19: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/19.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 19
Item-Item CF (|N|=2)
1.00
-0.18
0.41
-0.10
-0.31
0.59
sim(1,m)12 11 10 9 8 7 6 5 4 3 2 1
4 5 5 ? 3 1 1
3 1 2 4 4 5 2
5 3 4 3 2 1 4 2 3
2 4 5 4 2 4
5 2 2 4 3 4 5
4 2 3 3 1 6
Users
Mov
ies
- estimate rating of movie 1 by user 5
![Page 20: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/20.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Item-Item CF (|N|=2)
Neighbour selection: identify movies similar to movie 1, rated by user 5 Here we use Pearson correlation as similarity:
Subtract mean rating mi from each movie i
m1=(1+3+5+5+4)/5 = 3.6 row1:[ -2.6, 0, -0.6, 0, 0, 1.4, 0, 0, 1.4, 0, 0.4, 0]
Compute cosine similarities between rows
20
![Page 21: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/21.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Item-Item CF (|N|=2)
Compute similarity weights: s1,3 = 0.41, s1,6 = 0.59
1.00
-0.18
0.41
-0.10
-0.31
0.59
sim(1,m)12 11 10 9 8 7 6 5 4 3 2 1
4 5 5 ? 3 1 1
3 1 2 4 4 5 2
5 3 4 3 2 1 4 2 3
2 4 5 4 2 4
5 2 2 4 3 4 5
4 2 3 3 1 6
Users
Mov
ies
21
![Page 22: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/22.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Item-Item CF (|N|=2)
Predict by taking weighted average
r1,5 = (0.41*2 + 0.59*3) / (0.41+0.59) = 2.6
12 11 10 9 8 7 6 5 4 3 2 1
4 5 5 2.6 3 1 1
3 1 2 4 4 5 2
5 3 4 3 2 1 4 2 3
2 4 5 4 2 4
5 2 2 4 3 4 5
4 2 3 3 1 6
Users
Mov
ies
22
![Page 23: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/23.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Define similarity sij of items i and j Select k nearest neighbors N(i; x) ▪ Items most similar to i, that were rated by x Estimate rating rxi as the weighted average:
CF: Common Practice
23
baseline estimate for rxi µ = overall mean movie rating bx = rating deviation of user x
= (avg. rating of user x) – µ bi = rating deviation of movie i
∑∑
∈
∈−⋅
+=);(
);()(
xiNj ij
xiNj xjxjijxixi s
brsbr
![Page 24: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/24.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Item-Item vs. User-User
In practice, it has been observed that item-item often works better than user-userWhy? Items are simpler, users have multiple tastes
Avatar LOTR Matrix Pirates
Alice 1 0.8
Bob 0.5 0.3
Carol 0.9 1 0.8
David 1 0.4
24
![Page 25: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/25.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Works for any kind of item No feature selection needed
Unexpected recommendations A user may receive recommendations different from active searches done by itself
Groups with similar ratings Users may connect with each other and create groups with similar interests
Pros/Cons of Collaborative Filtering
25
![Page 26: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/26.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Cold Start Need enough users in the system to find a match
Sparsity The user/ratings matrix is sparse Hard to find users that have rated the same items
First rater Cannot recommend an item that has not been previously rated New items, Esoteric items
Popularity bias Cannot recommend items to someone with unique taste Tends to recommend popular items
Pros/Cons of Collaborative Filtering
26
![Page 27: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/27.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Hybrid Methods
Implement two or more different recommenders and combine predictions
Perhaps using a linear model
Add content-based methods to collaborative filtering
Item profiles for new item problem Demographics to deal with new user problem
27
![Page 28: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/28.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Remarks & Practical Tips- Evaluation - Error metrics - Complexity / Speed
2828
![Page 29: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/29.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Evaluation
1 3 4
3 5 5
4 5 5
3
3
2 2 2
5
2 1 1
3 3
1
Use
rs
Movies
29
![Page 30: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/30.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Evaluation
1 3 4
3 5 5
4 5 5
3
3
2 ? ?
?
2 1 ?
3 ?
1
Use
rs
Movies
Test Data Set
30
![Page 31: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/31.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Collaborative Filtering: Complexity
Expensive step is finding k most similar customers: O(|X|) Too expensive to do at runtime
Could pre-compute Naïve pre-computation takes time O(k·|X|)
We already know how to do this! Near-neighbor search in high dimensions (LSH) Clustering Dimensionality reduction
32
![Page 32: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/32.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Tip: Add Data
Leverage all the data Don’t try to reduce data size in an effort to make fancy algorithms work Simple methods on large data do best
Add more data e.g., add IMDB data on genres
More data beats better algorithms http://anand.typepad.com/datawocky/2008/03/more-data-usual.html
33
![Page 33: Bando de Dados Avançados - Recommender Systems](https://reader031.fdocument.pub/reader031/viewer/2022022413/58ecc8431a28abf4648b45d1/html5/thumbnails/33.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Questions
34