Breakout Recommender Platforms FINAL slr - Merkle · Selena Gomez ‐Love You Like A Love Song 7....
Transcript of Breakout Recommender Platforms FINAL slr - Merkle · Selena Gomez ‐Love You Like A Love Song 7....
Recommender PlatformsR d l tf t fRecommender platforms can outperform
traditional popularity based methods by double digits, but they require customization to match
any organization’s unique strategies.
Introduction
• Benefits
– Performance
– Differentiator
– Feature
– User Expectation
• Use Cases
– Live Web Recommendations
– Batch Email
– Any situation with multiple users and products
2
Introduction
• Recommendation Platform
– Produce items that are relevant to a user’s interests
• Some Strategies and Solutions
– Amazon (Purchase)
• Product discovery and comparison
– Pandora (Advertising)
• Engage and adapt
– Netflix (Subscription)( p )
• Increase usage
– LinkedIn (Subscription and Advertising)
• Participate in the social graphp g p
3
Major Parts of the Platform
Recommender Platform
DATA
ALGORITHM
PRODUCTION
TESTING
4
DataData
5
Types of Data
• User Data
– Demographics Who you are
– Behavioral What you do
– Temporal When you act
– Geographic Where you are
– Social How you connect
– Privacy?
• Item Data
– Descriptions and Attributes
– Category Relationships (Ontology)
6
Big and Sparse
• Big Data
– Millions of users
– Millions of products
• Sparse
– Tiny fraction of all possible combinations
• Netflix Prize Dataset
– 100 million ratings
– 500 thousand users
– 20 thousand movies
– 10 billion combinations
– 99% sparse!
7
Data and Analytic Tools
• Custom Solutions
• Data Store– Traditional Database
(Oracle)– Massively Parallel Data Warehouse
(Netezza, Teradata)( , )– Unstructured, Non Relational
(Hadoop, MongoDB)Christo; The Gates; New York; 2005
• Algorithm Languages– SQL– Analytical (SAS, R, Weka)
Programming (Java Python C++)– Programming (Java, Python, C++)
• Keep data and algorithm close
8Christo; Surrounded Islands; Florida, 1983
Al ith / S l tiAlgorithm / Solution
9
Common Solutions
• Popularity based
– Suggest the most popular items to all users
• User based Collaborative Filtering
– Find correlations between users, then suggest items related to similar users
• Item based Collaborative Filtering
– Find correlations between items, then suggest items similar to those related to the user
• Content‐based Filtering
– Create factors that describe the users and the items. Then find item‐user pairings with similar factor scores.
• Ensemble
– Optimally blend multiple methods into a single result set
– Use an overarching optimization technology that chooses the best method for each situation
10
Popularity Based (non‐personalized)
• Rank all items
– Revenue
– Clicks
American Top 401. Katy Perry ‐ The One That Got Away 2. Flo Rida ‐ Good Feeling 3. Bruno Mars ‐ It Will Rain
– Yield
• Every user gets the samerecommendations
3. Bruno Mars It Will Rain 4. Rihanna ‐We Found Love feat. Calvin Harris 5. Adele ‐ Set Fire To The Rain 6. Selena Gomez ‐ Love You Like A Love Song 7. Jessi J ‐ Domino 8. Lmfao ‐ Sexy And I Know It 9. David Guetta ‐Without You feat. Usher 10. Gavin Degraw ‐ Not Over You 11. J. Cole ‐Work Out 12. David Guetta ‐ Turn Me On feat. Nicki Minaj 13. Gym Class Heroes ‐ Stereo Hearts feat14. Maroon 5 ‐Moves Like Jagger feat15. Gym Class Heroes ‐ Self Back Home featy…
11
Item Based Correlations
• Apparel recommendations:
– Health & Beauty
– Jewelry
– Food
• Food recommends
– Pets
– Office products
– Travel
– Home & Garden
12
User Based
• Group all users
– Use a similarity metric
• Generate item list
– Find most popular items in each cluster
• Create recommendations
– Match the item list to the user clusters
13
Ensemble
• Netflix Prize
– Final solution consisted of blending multiple algorithms.
– Predictive accuracy is substantially improved when blending multiple predictors. Our experience is that most efforts should be concentrated in deriving substantially different approaches, rather than refining a single technique. Consequently, our solution is an ensemble of many methods. ‐‐BellKor 2007
14
Example:Example: Item‐Based Recommender
How that patented, attribute‐based, user‐profile enhanced, Bayesian, algorithmic modeling thing works
(based on a true story)
15
I sell women’s running shoes online.
Here is my product catalog with 12 different running shoes:
item_id item_name brand size81 nikerun1 nike N2 nikerun2 nike Y3 nikerun3 nike N CREATE TABLE item
(4 nikerun4 nike Y5 nikerun5 nike Y6 adidasrun1 adidas Y7 adidasrun2 adidas Y8 adidasrun3 adidas N9 did 4 did Y
(item_id INT NOT NULL,item_name VARCHAR(20),brand VARCHAR(20),size8 CHAR(1));
9 adidasrun4 adidas Y10 asicsrun1 asics Y11 asicsrun2 asics N12 asicsrun3 asics Y
16
Zooey Deschanel visits my website and clicks on the nikerun1 shoe.
Although something about this shoe has sparked her interest, it may not be quite what she is looking for.ay ot be qu te at s e s oo g o
d d l hI decide to suggest some similar shoes using a “People who viewed this shoe also viewed” recommendation model.
17
I need to know which shoes Zooey is most likely to want to see nextsee next.
I don’t know her that well…I will use the probabilities that other shoes are viewed in the same visit as the nikerun1 shoe as an estimatenikerun1 shoe as an estimate.
Time do dig up my website’s visit data….
18
And here is what my visit/item view data looks like:
visit_id seq_num item_id1 1 12 1 42 1 42 2 53 1 113 2 43 3 63 4 83 4 83 5 93 6 83 7 73 8 123 9 11
CREATE TABLE visit_item(visit_id INT NOT NULL,seq_num INT NOT NULL,item_id INT NOT NULL);
3 10 123 11 114 1 15 1 96 1 56 2 126 2 127 1 108 1 19 1 29 2 8
10 1 1110 2 10
19
W it i it i d th it t i I d ’t lik thi bWait…some visits viewed the same item twice. I don’t like this, because switching back and forth between items will skew the probabilities.
visit_id seq_num item_id1 1 1 visit id item id min seq max seq1 1 12 1 42 2 53 1 113 2 43 3 6
visit_id item_id min_seq max_seq1 1 1 12 4 1 12 5 2 23 4 2 23 6 3 3
3 4 83 5 93 6 83 7 73 8 123 9 11
3 7 7 73 8 4 63 9 5 53 11 1 113 12 8 10
3 9 113 10 123 11 114 1 15 1 96 1 5
4 1 1 15 9 1 16 5 1 16 12 2 27 10 1 1
CREATE TABLE visit_item_sumSELECTvisit_id,item_id,MIN(seq_num) min_seq
( )6 1 56 2 127 1 108 1 19 1 29 2 8
8 1 1 19 2 1 19 8 2 2
10 10 2 210 11 1 1
,MAX(seq_num) max_seqFROM visit_itemGROUP BY visit_id, item_id;
10 1 1110 2 10
20
It is often helpful to be able to translate vector and matrix math to SQL row and column operations. In matrix form, I have created a matrix: A = Items × Visits. Most of the entries are zero.
Next I will multiply this matrix by its transpose: M = A·AT
The result is an Items × Items matrix whose i,j entry is the number of visits that viewed both item i and item j.
In SQL, I perform the same operation by joining the visit item sum table to itself on the visit id andvisit_item_sum table to itself on the visit_id and aggregating to count the number of visits viewing pairs of items.
21
V il !
item_in item_out visits1 1 1746
Voila!
Note that when item in = item out, this1 2 4701 3 4571 4 3771 5 2791 6 4301 7 401
Note that when item_in item_out, this is the total number of visits viewing the given item.
1 7 4011 8 2701 9 2411 10 3971 11 2531 12 215
CREATE TABLE item_countsSELECTvis1.item_id item_in,vis2.item_id item_out,SUM(1) visitsFROM visit item sum vis11 12 215
2 1 4702 2 16722 3 5302 4 2712 5 246
FROM visit_item_sum vis1JOIN visit_item_sum vis2ON vis1.visit_id=vis2.visit_id
GROUP BY 1,2;
2 6 3612 7 3402 8 2572 9 2622 10 3582 11 2872 11 2872 12 216
22
If I divide the visits for each record by the total visits for item in then I will getIf I divide the visits for each record by the total visits for item_in, then I will get the conditional probability of viewing item_out given that I viewed item_in.
item_in item_out visits max_visits score 1 1 1746 1746 1.000 1 2 470 1746 0.269 1 3 457 1746 0.262 1 4 377 1746 0.216 1 5 279 1746 0.160 1 6 430 1746 0.246 1 7 401 1746 0 230
CREATE TABLE item_matrixSELECTic.item_in,ic.item_out,ic.visitsmv visits max visits1 7 401 1746 0.230
1 8 270 1746 0.155 1 9 241 1746 0.138 1 10 397 1746 0.227 1 11 253 1746 0.145 1 12 215 1746 0.123
,mv.visits max_visits,ic.visits/mv.visits scoreFROM item_counts icJOIN(SELECTitem_in1 12 215 1746 0.123
2 1 470 1672 0.281 2 2 1672 1672 1.000 2 3 530 1672 0.317 2 4 271 1672 0.162 2 5 246 1672 0.147
_,visitsFROM item_countsWHERE item_in=item_out)mvON mv.item_in=ic.item_in
;
2 6 361 1672 0.216 2 7 340 1672 0.203 2 8 257 1672 0.154 2 9 262 1672 0.157 2 10 358 1672 0.214 2 11 287 1672 0 1722 11 287 1672 0.172 2 12 216 1672 0.129
23
Zooey, here are your recommendations!
item_in item_name item_out item_name brand size8 score 1 nikerun1 2 nikerun2 nike Y 0.269 1 nikerun1 3 nikerun3 nike N 0.262 1 nikerun1 6 adidasrun1 adidas Y 0.246 1 ik 1 7 did 2 did Y 0 230
SELECTim.item_in,i1.item_name,im.item_out,i2.item_namei2 brand1 nikerun1 7 adidasrun2 adidas Y 0.230
1 nikerun1 10 asicsrun1 asics Y 0.227 1 nikerun1 4 nikerun4 nike Y 0.216 1 nikerun1 5 nikerun5 nike Y 0.160 1 nikerun1 8 adidasrun3 adidas N 0.155 1 nikerun1 11 asicsrun2 asics N 0 145
,i2.brand,i2.size8,im.scoreFROM item_matrix imJOIN item i1ON i1.item_id=im.item_inJOIN item i2
1 nikerun1 11 asicsrun2 asics N 0.145 1 nikerun1 9 adidasrun4 adidas Y 0.138 1 nikerun1 12 asicsrun3 asics Y 0.123
ON i2.item_id=im.item_outWHERE item_in<>item_outORDER BY item_in, score DESC;
Hmmm…how can I make them better?
24
Idea! Add a time component. p
item_in item_out visits max_visits score 1 1 1746 1746 1.000 1 2 284 1746 0.163
I’m only going to sum visits where item out was viewed sometime
1 3 290 1746 0.166 1 4 231 1746 0.132 1 5 153 1746 0.088 1 6 269 1746 0.154 1 7 230 1746 0.132 1 8 166 1746 0 095
item_out was viewed sometime after item_in.
Now I have probabilities that a 1 8 166 1746 0.095 1 9 140 1746 0.080 1 10 215 1746 0.123 1 11 141 1746 0.081 1 12 97 1746 0.056 2 1 293 1672 0.175
visitor will view item_out afteritem_in.
CREATE TABLE item counts2
CREATE TABLE item_matrix2SELECTic.item_in
2 2 1672 1672 1.000 2 3 356 1672 0.213 2 4 166 1672 0.099 2 5 141 1672 0.084 2 6 235 1672 0.141 2 7 191 1672 0 114
CREATE TABLE item_counts2SELECTvis1.item_id item_in,vis2.item_id item_out,SUM(CASE WHEN vis1.min_seq<=vis2.max_seq THEN 1 END) visits
,ic.item_out,ic.visits,mv.visits max_visits,ic.visits/mv.visits scoreFROM item_counts2 icJOIN(2 7 191 1672 0.114
2 8 140 1672 0.084 2 9 120 1672 0.072 2 10 201 1672 0.120 2 11 142 1672 0.085 2 12 117 1672 0.070
FROM visit_item_sum vis1JOIN visit_item_sum vis2ON vis1.visit_id=vis2.visit_id
GROUP BY 1,2;
(SELECTitem_in,visitsFROM item_counts2WHERE item_in=item_out)mv)ON mv.item_in=ic.item_in
;25
Zooey, here are some better recommendations!
item_in item_name item_out item_name brand size8 score1 nikerun1 3 nikerun3 nike N 0.16611 nikerun1 2 nikerun2 nike Y 0.16271 nikerun1 6 adidasrun1 adidas Y 0.1541
SELECTim.item_in,i1.item_name,im.item_out,i2.item_name,i2.brand
1 nikerun1 4 nikerun4 nike Y 0.13231 nikerun1 7 adidasrun2 adidas Y 0.13171 nikerun1 10 asicsrun1 asics Y 0.12311 nikerun1 8 adidasrun3 adidas N 0.09511 nikerun1 5 nikerun5 nike Y 0.08761 ik 1 11 i 2 i N 0 0808
,,i2.size8,im.scoreFROM item_matrix2 imJOIN item i1ON i1.item_id=im.item_inJOIN item i2
d1 nikerun1 11 asicsrun2 asics N 0.08081 nikerun1 9 adidasrun4 adidas Y 0.08021 nikerun1 12 asicsrun3 asics Y 0.0556
Wait I actually know you better than I thought I did
ON i2.item_id=im.item_outWHERE item_in<>item_outORDER BY item_in, score DESC;
Wait, I actually know you better than I thought I did...
26
I recognize Zooey because she has shopped at my website before. She is user_id=1. Based on the shoes she has looked at or bought in the past, I know that she prefers Asics brand to Nike brand, and she doesn’t care f Adid h Al h l b i 8 h
user_id attrib_type attrib_value prob 1 brand nike 0.40 1 brand adidas 0.10
for Adidas much. Also, she always buys size 8 shoes.
CREATE TABLE user_profile(ser id INT NOT NULL1 brand adidas 0.10
1 brand asics 0.50 1 size8 Y 0.951 size8 N 0.05
d l b b l d b
user_id INT NOT NULL,attrib_type VARCHAR(20),attrib_value VARCHAR(20),prob FLOAT);
CREATE TABLE attrib probEnter naïve Bayes conditional probability distribution…
_pSELECTbrand,size8,COUNT(*)/item_tot probFROM itemJOIN((SELECTCOUNT(*) item_totFROM item)totGROUP BY 1,2;
Here C is the event that you view item_out after you view item_in, F1 is the brand attribute, F2 is the size8 ;attribute.
27
CREATE TABLE user_item_matrixSELECTitem_in,item_outscore
item_in item_out score item_name brand size8 brand_probsize8_prob attrib_prob final_score1 1 1.000 nikerun1 nike N 0.40 0.05 0.167 0.120 1 2 0.163 nikerun2 nike Y 0.40 0.95 0.250 0.248 1 3 0.166 nikerun3 nike N 0.40 0.05 0.167 0.020 1 4 0.132 nikerun4 nike Y 0.40 0.95 0.250 0.201 ,score
,item_name,i.brand,i.size8,upb.prob brand_prob,ups.prob size8_prob,ap.prob attrib_prob
1 5 0.088 nikerun5 nike Y 0.40 0.95 0.250 0.134 1 6 0.154 adidasrun1 adidas Y 0.10 0.95 0.250 0.059 1 7 0.132 adidasrun2 adidas Y 0.10 0.95 0.250 0.050 1 8 0.095 adidasrun3 adidas N 0.10 0.05 0.083 0.006 1 9 0.080 adidasrun4 adidas Y 0.10 0.95 0.250 0.030 1 10 0.123 asicsrun1 asics Y 0.50 0.95 0.167 0.350 p p _p
,score * upb.prob * ups.prob / ap.prob final_scoreFROM item_matrix2 imJOIN item iON i.item_id=im.item_outJOIN user_profile upbON b tt ib t 'b d' AND
1 11 0.081 asicsrun2 asics N 0.50 0.05 0.083 0.024 1 12 0.056 asicsrun3 asics Y 0.50 0.95 0.167 0.159 2 1 0.175 nikerun1 nike N 0.40 0.05 0.167 0.021 2 2 1.000 nikerun2 nike Y 0.40 0.95 0.250 1.520 2 3 0.213 nikerun3 nike N 0.40 0.05 0.167 0.026 2 4 0.099 nikerun4 nike Y 0.40 0.95 0.250 0.150
ON upb.attrib_type='brand' AND upb.attrib_value=i.brandJOIN user_profile upsON ups.attrib_type='size8' AND
ups.attrib_value=i.size8JOIN attrib_prob apON ap.brand=i.brand AND ap.size8=i.size8
2 5 0.084 nikerun5 nike Y 0.40 0.95 0.250 0.128 2 6 0.141 adidasrun1 adidas Y 0.10 0.95 0.250 0.054 2 7 0.114 adidasrun2 adidas Y 0.10 0.95 0.250 0.043 2 8 0.084 adidasrun3 adidas N 0.10 0.05 0.083 0.005 2 9 0.072 adidasrun4 adidas Y 0.10 0.95 0.250 0.027 2 10 0.120 asicsrun1 asics Y 0.50 0.95 0.167 0.341
ON ap.brand i.brand AND ap.size8 i.size8WHERE upb.user_id=1 AND ups.user_id=1;
2 11 0.085 asicsrun2 asics N 0.50 0.05 0.083 0.026 2 12 0.070 asicsrun3 asics Y 0.50 0.95 0.167 0.199
28
Zooey, here are recommendations just for you!SELECTitem in item name item out item name brand size8 score final scoreim.item_in,i1.item_name,im.item_out,i2.item_name,i2.brand,i2.size8im score
item_in item_name item_out item_name brand size8 score final_score 1 nikerun1 10 asicsrun1 asics Y 0.1230 0.3499 1 nikerun1 2 nikerun2 nike Y 0.1630 0.2478 1 nikerun1 4 nikerun4 nike Y 0.1320 0.2006 1 nikerun1 12 asicsrun3 asics Y 0.0560 0.1593 1 nikerun1 5 nikerun5 nike Y 0.0880 0.1338 ,im.score
,im.final_scoreFROM user_item_matrix imJOIN item i1ON i1.item_id=im.item_inJOIN item i2ON i2.item_id=im.item_out
1 nikerun1 6 adidasrun1 adidas Y 0.1540 0.0585 1 nikerun1 7 adidasrun2 adidas Y 0.1320 0.0502 1 nikerun1 9 adidasrun4 adidas Y 0.0800 0.0304 1 nikerun1 11 asicsrun2 asics N 0.0810 0.0244 1 nikerun1 3 nikerun3 nike N 0.1660 0.0199
Notice how Asics shoes have risen to the top and Adidas shoes have been pushed down Also shoes that are not
WHERE item_in<>item_outORDER BY item_in, final_score DESC;
1 nikerun1 8 adidasrun3 adidas N 0.0950 0.0057
shoes have been pushed down. Also, shoes that are not available in size 8 have been pushed to the bottom.
29
We have built an effective recommendation model that is a hybrid of an Item‐based Collaborative Filter, and a Content‐based Filter.
Many modern recommender systems use an ensemble learning approach, combining multiple recommendation models like the ones we just built.
The best performing recommendation models in online retail are still the well known item‐to‐item strategies such as:
• People who viewed this item also viewed• People who viewed this item also viewed• People who viewed this item eventually bought• People who bought this item also bought
30
Production
31
Typical website recommender platformyp p
Recommender Platform
Database
User Data Product Data
Model Builder
Website Recommendation Scorer
Multivariate Test Framework
Batch processing
Real‐time processing
32
Common Challenges
Product solution• Data transfer
Ali i f t i• Aligning performance metrics
• Network latency/problems
• Customization
L f i t l l i• Loss of internal learning
Custom solution• Finding the right people to build, test, and maintain your solution: software
developers, data scientists, marketing analysts
• Keeping resources dedicated to improving the system
B ildi t l tf t l ith ti t k h• Building a management platform – tune algorithm, reporting, common tasks such as boosting and blacklisting
33
Finding the Right Solution
• Evaluate recommendation quality (not just performance)
– Eye testy
– Does your recommendation distribution roughly match your sales distribution?
– Global extrema vs local extrema
• Apply feedback mechanisms with care
Use the right metrics– Use the right metrics
– Don’t be too aggressive
• Be creative when it comes to solving your i blunique problems
• Don’t take it personally...everyone will take it personally
34
iTesting
35
How Testing Works
1. Design
1. Build testing framework
2. Create the test algorithm DesignPredict
2. Execute
1. Randomly assign users
2. Apply recommendations ExecuteAnalyzepp y
3. Collect data
3. Analyze
1. Build reports
y
p
2. Mine recommendations
3. Disseminate insights and understanding
4. Predict
1. Extrapolate findings
2. Brainstorm new tests
36
Example: Performance Based Email
• Background
– Thousands of ad campaigns
– Millions of users
– Recommendations are created in batch
– Sent out in batch
– Advertisers pay per click (CPC)p y p
• Test Framework
– User population is broken into slots
– Analysts create and manage tests via slotsy g
– Reports and KPI’s are rolled up by slot
37
Example: Performance Based Email
Randomly allocate users into equal groups of 20
• select substring(hashbytes('md5',email_address),15,2)%20
Example
• Globally unique identifier (GUID): [email protected]
• Hash function (MD5): B306F4023D42F8B34ACEDE0F5ED07EC6
• Hexadecimal to binary conversion (last 4 digits): 7EC6 ‐> 32,454
• Divide by 20 and take the remainder (modulo): 32,454 mod 20 = 14
• Final slot: 14
Advantages
• No user‐test lookup table requiredp q
• New users are instantly assigned to a slot
• Number of slots is variable
• Slots can randomly be shuffled by changing one parameter
38
y y g g p
Example: Performance Based Email
• Test Table
Test Start Date End Date Slot Start Slot End
Control 0 9Control ‐ ‐ 0 9
Item Based Jan 13 Feb 5 10 19
User Based Feb 5 Feb 26 10 14
Ensemble Feb 5 Feb 26 15 19
• Scoring
– Create recommendations for users in the test groupCreate recommendations for users in the test group
39
Example: Performance Based Email
• Web Based Reports
– Test improvement relative to control
– KPI’s: Impressions, Clicks, Conversions
– Growth Equation: Revenue = PPC x CTR x Impressions
– Test performance over time
– Eye test / User detail
Test Percent Improvement
Revenue CTR PPC Volume
Control ‐ ‐ ‐ ‐
User Based +10% +20% +5% +0%
Ensemble +15% +30% ‐10% +5%
40
Generating Insights from Tests
Insights
• Design for accurate and diverse recommendations
• Recent behavior matters both in modeling and scoring
• Know your data
• Challenge the status quoDesignPredict
Recommendations
• Improvement comes through testing
• Communicate insights to the organization
ExecuteAnalyzeg g
• Have fun
41
C l iConclusion
42
Conclusion
• Recommender platforms can outperform popularity based approaches by double digits, create a competitive advantage, and enhance the user experience.
• Every organization is going to have their own unique implementation depending on technology and strategy. Platforms are built around a variety of data stores and leverage a variety of programming languages.
• A strong solution requires excellence in all four components (i.e. data, algorithm, production, and testing).
• Testing produces consumer insights and understanding that lead to constant improvement.
43
Nathan Stephens
Senior Manager
MerkleA l i
Senior Manager
Analytics Allen Dickson
Senior Manager
Discussion Questions
1. Are recommender platforms valuable to my organization? How would a recommender platform fit into my overall strategy?
2. How do I get started?
3. Which metrics could be good indicators of recommendation quality (click through rate, volume, attrition, etc.)?
4. How might a recommender system adapt to multiple users with the same account? F l f il i ht h th A t H ldFor example, a family might share the same Amazon account. How would an ensemble solution address this issue?
5 Why do recommender platforms require constant improvement? How does constant5. Why do recommender platforms require constant improvement? How does constant improvement benefit the organization?
45