Breakout Recommender Platforms FINAL slr - Merkle · Selena Gomez ‐Love You Like A Love Song 7....

Recommender PlatformsR d l tf t fRecommender platforms can outperform

traditional popularity based methods by double digits, but they require customization to match

any organization’s unique strategies.

Introduction

• Benefits

– Performance

– Differentiator

– Feature

– User Expectation

• Use Cases

– Live Web Recommendations

– Batch Email

– Any situation with multiple users and products

2

Introduction

• Recommendation Platform

– Produce items that are relevant to a user’s interests

• Some Strategies and Solutions

– Amazon (Purchase)

• Product discovery and comparison

– Pandora (Advertising)

• Engage and adapt

– Netflix (Subscription)( p )

• Increase usage

– LinkedIn (Subscription and Advertising)

• Participate in the social graphp g p

3

Major Parts of the Platform

Recommender Platform

DATA

ALGORITHM

PRODUCTION

TESTING

4

DataData

5

Types of Data

• User Data

– Demographics Who you are

– Behavioral What you do

– Temporal When you act

– Geographic Where you are

– Social How you connect

– Privacy?

• Item Data

– Descriptions and Attributes

– Category Relationships (Ontology)

6

Big and Sparse

• Big Data

– Millions of users

– Millions of products

• Sparse

– Tiny fraction of all possible combinations

• Netflix Prize Dataset

– 100 million ratings

– 500 thousand users

– 20 thousand movies

– 10 billion combinations

– 99% sparse!

7

Data and Analytic Tools

• Custom Solutions

• Data Store– Traditional Database

(Oracle)– Massively Parallel Data Warehouse

(Netezza, Teradata)( , )– Unstructured, Non Relational

(Hadoop, MongoDB)Christo; The Gates; New York; 2005

• Algorithm Languages– SQL– Analytical (SAS, R, Weka)

Programming (Java Python C++)– Programming (Java, Python, C++)

• Keep data and algorithm close

8Christo; Surrounded Islands; Florida, 1983

Al ith / S l tiAlgorithm / Solution

9

Common Solutions

• Popularity based

– Suggest the most popular items to all users

• User based Collaborative Filtering

– Find correlations between users, then suggest items related to similar users

• Item based Collaborative Filtering

– Find correlations between items, then suggest items similar to those related to the user

• Content‐based Filtering

– Create factors that describe the users and the items. Then find item‐user pairings with similar factor scores.

• Ensemble

– Optimally blend multiple methods into a single result set

– Use an overarching optimization technology that chooses the best method for each situation

10

Popularity Based (non‐personalized)

• Rank all items

– Revenue

– Clicks

American Top 401. Katy Perry ‐ The One That Got Away 2. Flo Rida ‐ Good Feeling 3. Bruno Mars ‐ It Will Rain

– Yield

• Every user gets the samerecommendations

3. Bruno Mars It Will Rain 4. Rihanna ‐We Found Love feat. Calvin Harris 5. Adele ‐ Set Fire To The Rain 6. Selena Gomez ‐ Love You Like A Love Song 7. Jessi J ‐ Domino 8. Lmfao ‐ Sexy And I Know It 9. David Guetta ‐Without You feat. Usher 10. Gavin Degraw ‐ Not Over You 11. J. Cole ‐Work Out 12. David Guetta ‐ Turn Me On feat. Nicki Minaj 13. Gym Class Heroes ‐ Stereo Hearts feat14. Maroon 5 ‐Moves Like Jagger feat15. Gym Class Heroes ‐ Self Back Home featy…

11

Item Based Correlations

• Apparel recommendations:

– Health & Beauty

– Jewelry

– Food

• Food recommends

– Pets

– Office products

– Travel

– Home & Garden

12

User Based

• Group all users

– Use a similarity metric

• Generate item list

– Find most popular items in each cluster

• Create recommendations

– Match the item list to the user clusters

13

Ensemble

• Netflix Prize

– Final solution consisted of blending multiple algorithms.

– Predictive accuracy is substantially improved when blending multiple predictors. Our experience is that most efforts should be concentrated in deriving substantially different approaches, rather than refining a single technique. Consequently, our solution is an ensemble of many methods. ‐‐BellKor 2007

14

Example:Example: Item‐Based Recommender

How that patented, attribute‐based, user‐profile enhanced, Bayesian, algorithmic modeling thing works

(based on a true story)

15

I sell women’s running shoes online.

Here is my product catalog with 12 different running shoes:

item_id item_name brand size81 nikerun1 nike N2 nikerun2 nike Y3 nikerun3 nike N CREATE TABLE item

(4 nikerun4 nike Y5 nikerun5 nike Y6 adidasrun1 adidas Y7 adidasrun2 adidas Y8 adidasrun3 adidas N9 did 4 did Y

(item_id INT NOT NULL,item_name VARCHAR(20),brand VARCHAR(20),size8 CHAR(1));

9 adidasrun4 adidas Y10 asicsrun1 asics Y11 asicsrun2 asics N12 asicsrun3 asics Y

16

Zooey Deschanel visits my website and clicks on the nikerun1 shoe.

Although something about this shoe has sparked her interest, it may not be quite what she is looking for.ay ot be qu te at s e s oo g o

d d l hI decide to suggest some similar shoes using a “People who viewed this shoe also viewed” recommendation model.

17

I need to know which shoes Zooey is most likely to want to see nextsee next.

I don’t know her that well…I will use the probabilities that other shoes are viewed in the same visit as the nikerun1 shoe as an estimatenikerun1 shoe as an estimate.

Time do dig up my website’s visit data….

18

And here is what my visit/item view data looks like:

visit_id seq_num item_id1 1 12 1 42 1 42 2 53 1 113 2 43 3 63 4 83 4 83 5 93 6 83 7 73 8 123 9 11

CREATE TABLE visit_item(visit_id INT NOT NULL,seq_num INT NOT NULL,item_id INT NOT NULL);

3 10 123 11 114 1 15 1 96 1 56 2 126 2 127 1 108 1 19 1 29 2 8

10 1 1110 2 10

19

W it i it i d th it t i I d ’t lik thi bWait…some visits viewed the same item twice. I don’t like this, because switching back and forth between items will skew the probabilities.

visit_id seq_num item_id1 1 1 visit id item id min seq max seq1 1 12 1 42 2 53 1 113 2 43 3 6

visit_id item_id min_seq max_seq1 1 1 12 4 1 12 5 2 23 4 2 23 6 3 3

3 4 83 5 93 6 83 7 73 8 123 9 11

3 7 7 73 8 4 63 9 5 53 11 1 113 12 8 10

3 9 113 10 123 11 114 1 15 1 96 1 5

4 1 1 15 9 1 16 5 1 16 12 2 27 10 1 1

CREATE TABLE visit_item_sumSELECTvisit_id,item_id,MIN(seq_num) min_seq

( )6 1 56 2 127 1 108 1 19 1 29 2 8

8 1 1 19 2 1 19 8 2 2

10 10 2 210 11 1 1

,MAX(seq_num) max_seqFROM visit_itemGROUP BY visit_id, item_id;

10 1 1110 2 10

20

It is often helpful to be able to translate vector and matrix math to SQL row and column operations. In matrix form, I have created a matrix: A = Items × Visits. Most of the entries are zero.

Next I will multiply this matrix by its transpose: M = A·AT

The result is an Items × Items matrix whose i,j entry is the number of visits that viewed both item i and item j.

In SQL, I perform the same operation by joining the visit item sum table to itself on the visit id andvisit_item_sum table to itself on the visit_id and aggregating to count the number of visits viewing pairs of items.

21

V il !

item_in item_out visits1 1 1746

Voila!

Note that when item in = item out, this1 2 4701 3 4571 4 3771 5 2791 6 4301 7 401

Note that when item_in item_out, this is the total number of visits viewing the given item.

1 7 4011 8 2701 9 2411 10 3971 11 2531 12 215

CREATE TABLE item_countsSELECTvis1.item_id item_in,vis2.item_id item_out,SUM(1) visitsFROM visit item sum vis11 12 215

2 1 4702 2 16722 3 5302 4 2712 5 246

FROM visit_item_sum vis1JOIN visit_item_sum vis2ON vis1.visit_id=vis2.visit_id

GROUP BY 1,2;

2 6 3612 7 3402 8 2572 9 2622 10 3582 11 2872 11 2872 12 216

22

If I divide the visits for each record by the total visits for item in then I will getIf I divide the visits for each record by the total visits for item_in, then I will get the conditional probability of viewing item_out given that I viewed item_in.

item_in item_out visits max_visits score 1 1 1746 1746 1.000 1 2 470 1746 0.269 1 3 457 1746 0.262 1 4 377 1746 0.216 1 5 279 1746 0.160 1 6 430 1746 0.246 1 7 401 1746 0 230

CREATE TABLE item_matrixSELECTic.item_in,ic.item_out,ic.visitsmv visits max visits1 7 401 1746 0.230

1 8 270 1746 0.155 1 9 241 1746 0.138 1 10 397 1746 0.227 1 11 253 1746 0.145 1 12 215 1746 0.123

,mv.visits max_visits,ic.visits/mv.visits scoreFROM item_counts icJOIN(SELECTitem_in1 12 215 1746 0.123

2 1 470 1672 0.281 2 2 1672 1672 1.000 2 3 530 1672 0.317 2 4 271 1672 0.162 2 5 246 1672 0.147

_,visitsFROM item_countsWHERE item_in=item_out)mvON mv.item_in=ic.item_in

;

2 6 361 1672 0.216 2 7 340 1672 0.203 2 8 257 1672 0.154 2 9 262 1672 0.157 2 10 358 1672 0.214 2 11 287 1672 0 1722 11 287 1672 0.172 2 12 216 1672 0.129

23

Zooey, here are your recommendations!

item_in item_name item_out item_name brand size8 score 1 nikerun1 2 nikerun2 nike Y 0.269 1 nikerun1 3 nikerun3 nike N 0.262 1 nikerun1 6 adidasrun1 adidas Y 0.246 1 ik 1 7 did 2 did Y 0 230

SELECTim.item_in,i1.item_name,im.item_out,i2.item_namei2 brand1 nikerun1 7 adidasrun2 adidas Y 0.230

1 nikerun1 10 asicsrun1 asics Y 0.227 1 nikerun1 4 nikerun4 nike Y 0.216 1 nikerun1 5 nikerun5 nike Y 0.160 1 nikerun1 8 adidasrun3 adidas N 0.155 1 nikerun1 11 asicsrun2 asics N 0 145

,i2.brand,i2.size8,im.scoreFROM item_matrix imJOIN item i1ON i1.item_id=im.item_inJOIN item i2

1 nikerun1 11 asicsrun2 asics N 0.145 1 nikerun1 9 adidasrun4 adidas Y 0.138 1 nikerun1 12 asicsrun3 asics Y 0.123

ON i2.item_id=im.item_outWHERE item_in<>item_outORDER BY item_in, score DESC;

Hmmm…how can I make them better?

24

Idea! Add a time component. p

item_in item_out visits max_visits score 1 1 1746 1746 1.000 1 2 284 1746 0.163

I’m only going to sum visits where item out was viewed sometime

1 3 290 1746 0.166 1 4 231 1746 0.132 1 5 153 1746 0.088 1 6 269 1746 0.154 1 7 230 1746 0.132 1 8 166 1746 0 095

item_out was viewed sometime after item_in.

Now I have probabilities that a 1 8 166 1746 0.095 1 9 140 1746 0.080 1 10 215 1746 0.123 1 11 141 1746 0.081 1 12 97 1746 0.056 2 1 293 1672 0.175

visitor will view item_out afteritem_in.

CREATE TABLE item counts2

CREATE TABLE item_matrix2SELECTic.item_in

2 2 1672 1672 1.000 2 3 356 1672 0.213 2 4 166 1672 0.099 2 5 141 1672 0.084 2 6 235 1672 0.141 2 7 191 1672 0 114

CREATE TABLE item_counts2SELECTvis1.item_id item_in,vis2.item_id item_out,SUM(CASE WHEN vis1.min_seq<=vis2.max_seq THEN 1 END) visits

,ic.item_out,ic.visits,mv.visits max_visits,ic.visits/mv.visits scoreFROM item_counts2 icJOIN(2 7 191 1672 0.114

2 8 140 1672 0.084 2 9 120 1672 0.072 2 10 201 1672 0.120 2 11 142 1672 0.085 2 12 117 1672 0.070

FROM visit_item_sum vis1JOIN visit_item_sum vis2ON vis1.visit_id=vis2.visit_id

GROUP BY 1,2;

(SELECTitem_in,visitsFROM item_counts2WHERE item_in=item_out)mv)ON mv.item_in=ic.item_in

;25

Zooey, here are some better recommendations!

item_in item_name item_out item_name brand size8 score1 nikerun1 3 nikerun3 nike N 0.16611 nikerun1 2 nikerun2 nike Y 0.16271 nikerun1 6 adidasrun1 adidas Y 0.1541

SELECTim.item_in,i1.item_name,im.item_out,i2.item_name,i2.brand

1 nikerun1 4 nikerun4 nike Y 0.13231 nikerun1 7 adidasrun2 adidas Y 0.13171 nikerun1 10 asicsrun1 asics Y 0.12311 nikerun1 8 adidasrun3 adidas N 0.09511 nikerun1 5 nikerun5 nike Y 0.08761 ik 1 11 i 2 i N 0 0808

,,i2.size8,im.scoreFROM item_matrix2 imJOIN item i1ON i1.item_id=im.item_inJOIN item i2

d1 nikerun1 11 asicsrun2 asics N 0.08081 nikerun1 9 adidasrun4 adidas Y 0.08021 nikerun1 12 asicsrun3 asics Y 0.0556

Wait I actually know you better than I thought I did

ON i2.item_id=im.item_outWHERE item_in<>item_outORDER BY item_in, score DESC;

Wait, I actually know you better than I thought I did...

26

I recognize Zooey because she has shopped at my website before. She is user_id=1. Based on the shoes she has looked at or bought in the past, I know that she prefers Asics brand to Nike brand, and she doesn’t care f Adid h Al h l b i 8 h

user_id attrib_type attrib_value prob 1 brand nike 0.40 1 brand adidas 0.10

for Adidas much. Also, she always buys size 8 shoes.

CREATE TABLE user_profile(ser id INT NOT NULL1 brand adidas 0.10

1 brand asics 0.50 1 size8 Y 0.951 size8 N 0.05

d l b b l d b

user_id INT NOT NULL,attrib_type VARCHAR(20),attrib_value VARCHAR(20),prob FLOAT);

CREATE TABLE attrib probEnter naïve Bayes conditional probability distribution…

_pSELECTbrand,size8,COUNT(*)/item_tot probFROM itemJOIN((SELECTCOUNT(*) item_totFROM item)totGROUP BY 1,2;

Here C is the event that you view item_out after you view item_in, F1 is the brand attribute, F2 is the size8 ;attribute.

27

CREATE TABLE user_item_matrixSELECTitem_in,item_outscore

item_in item_out score item_name brand size8 brand_probsize8_prob attrib_prob final_score1 1 1.000 nikerun1 nike N 0.40 0.05 0.167 0.120 1 2 0.163 nikerun2 nike Y 0.40 0.95 0.250 0.248 1 3 0.166 nikerun3 nike N 0.40 0.05 0.167 0.020 1 4 0.132 nikerun4 nike Y 0.40 0.95 0.250 0.201 ,score

,item_name,i.brand,i.size8,upb.prob brand_prob,ups.prob size8_prob,ap.prob attrib_prob

1 5 0.088 nikerun5 nike Y 0.40 0.95 0.250 0.134 1 6 0.154 adidasrun1 adidas Y 0.10 0.95 0.250 0.059 1 7 0.132 adidasrun2 adidas Y 0.10 0.95 0.250 0.050 1 8 0.095 adidasrun3 adidas N 0.10 0.05 0.083 0.006 1 9 0.080 adidasrun4 adidas Y 0.10 0.95 0.250 0.030 1 10 0.123 asicsrun1 asics Y 0.50 0.95 0.167 0.350 p p _p

,score * upb.prob * ups.prob / ap.prob final_scoreFROM item_matrix2 imJOIN item iON i.item_id=im.item_outJOIN user_profile upbON b tt ib t 'b d' AND

1 11 0.081 asicsrun2 asics N 0.50 0.05 0.083 0.024 1 12 0.056 asicsrun3 asics Y 0.50 0.95 0.167 0.159 2 1 0.175 nikerun1 nike N 0.40 0.05 0.167 0.021 2 2 1.000 nikerun2 nike Y 0.40 0.95 0.250 1.520 2 3 0.213 nikerun3 nike N 0.40 0.05 0.167 0.026 2 4 0.099 nikerun4 nike Y 0.40 0.95 0.250 0.150

ON upb.attrib_type='brand' AND upb.attrib_value=i.brandJOIN user_profile upsON ups.attrib_type='size8' AND

ups.attrib_value=i.size8JOIN attrib_prob apON ap.brand=i.brand AND ap.size8=i.size8

2 5 0.084 nikerun5 nike Y 0.40 0.95 0.250 0.128 2 6 0.141 adidasrun1 adidas Y 0.10 0.95 0.250 0.054 2 7 0.114 adidasrun2 adidas Y 0.10 0.95 0.250 0.043 2 8 0.084 adidasrun3 adidas N 0.10 0.05 0.083 0.005 2 9 0.072 adidasrun4 adidas Y 0.10 0.95 0.250 0.027 2 10 0.120 asicsrun1 asics Y 0.50 0.95 0.167 0.341

ON ap.brand i.brand AND ap.size8 i.size8WHERE upb.user_id=1 AND ups.user_id=1;

2 11 0.085 asicsrun2 asics N 0.50 0.05 0.083 0.026 2 12 0.070 asicsrun3 asics Y 0.50 0.95 0.167 0.199

28

Zooey, here are recommendations just for you!SELECTitem in item name item out item name brand size8 score final scoreim.item_in,i1.item_name,im.item_out,i2.item_name,i2.brand,i2.size8im score

item_in item_name item_out item_name brand size8 score final_score 1 nikerun1 10 asicsrun1 asics Y 0.1230 0.3499 1 nikerun1 2 nikerun2 nike Y 0.1630 0.2478 1 nikerun1 4 nikerun4 nike Y 0.1320 0.2006 1 nikerun1 12 asicsrun3 asics Y 0.0560 0.1593 1 nikerun1 5 nikerun5 nike Y 0.0880 0.1338 ,im.score

,im.final_scoreFROM user_item_matrix imJOIN item i1ON i1.item_id=im.item_inJOIN item i2ON i2.item_id=im.item_out

1 nikerun1 6 adidasrun1 adidas Y 0.1540 0.0585 1 nikerun1 7 adidasrun2 adidas Y 0.1320 0.0502 1 nikerun1 9 adidasrun4 adidas Y 0.0800 0.0304 1 nikerun1 11 asicsrun2 asics N 0.0810 0.0244 1 nikerun1 3 nikerun3 nike N 0.1660 0.0199

Notice how Asics shoes have risen to the top and Adidas shoes have been pushed down Also shoes that are not

WHERE item_in<>item_outORDER BY item_in, final_score DESC;

1 nikerun1 8 adidasrun3 adidas N 0.0950 0.0057

shoes have been pushed down. Also, shoes that are not available in size 8 have been pushed to the bottom.

29

We have built an effective recommendation model that is a hybrid of an Item‐based Collaborative Filter, and a Content‐based Filter.

Many modern recommender systems use an ensemble learning approach, combining multiple recommendation models like the ones we just built.

The best performing recommendation models in online retail are still the well known item‐to‐item strategies such as:

• People who viewed this item also viewed• People who viewed this item also viewed• People who viewed this item eventually bought• People who bought this item also bought

30

Production

31

Typical website recommender platformyp p

Recommender Platform

Database

User Data Product Data

Model Builder

Website Recommendation Scorer

Multivariate Test Framework

Batch processing

Real‐time processing

32

Common Challenges

Product solution• Data transfer

Ali i f t i• Aligning performance metrics

• Network latency/problems

• Customization

L f i t l l i• Loss of internal learning

Custom solution• Finding the right people to build, test, and maintain your solution: software

developers, data scientists, marketing analysts

• Keeping resources dedicated to improving the system

B ildi t l tf t l ith ti t k h• Building a management platform – tune algorithm, reporting, common tasks such as boosting and blacklisting

33

Finding the Right Solution

• Evaluate recommendation quality (not just performance)

– Eye testy

– Does your recommendation distribution roughly match your sales distribution?

– Global extrema vs local extrema

• Apply feedback mechanisms with care

Use the right metrics– Use the right metrics

– Don’t be too aggressive

• Be creative when it comes to solving your i blunique problems

• Don’t take it personally...everyone will take it personally

34

iTesting

35

How Testing Works

1. Design

1. Build testing framework

2. Create the test algorithm DesignPredict

2. Execute

1. Randomly assign users

2. Apply recommendations ExecuteAnalyzepp y

3. Collect data

3. Analyze

1. Build reports

y

p

2. Mine recommendations

3. Disseminate insights and understanding

4. Predict

1. Extrapolate findings

2. Brainstorm new tests

36

Example: Performance Based Email

• Background

– Thousands of ad campaigns

– Millions of users

– Recommendations are created in batch

– Sent out in batch

– Advertisers pay per click (CPC)p y p

• Test Framework

– User population is broken into slots

– Analysts create and manage tests via slotsy g

– Reports and KPI’s are rolled up by slot

37


Randomly allocate users into equal groups of 20

• select substring(hashbytes('md5',email_address),15,2)%20

Example

• Globally unique identifier (GUID): [email protected]

• Hash function (MD5): B306F4023D42F8B34ACEDE0F5ED07EC6

• Hexadecimal to binary conversion (last 4 digits): 7EC6 ‐> 32,454

• Divide by 20 and take the remainder (modulo): 32,454 mod 20 = 14

• Final slot: 14

Advantages

• No user‐test lookup table requiredp q

• New users are instantly assigned to a slot

• Number of slots is variable

• Slots can randomly be shuffled by changing one parameter

38

y y g g p


• Test Table

Test Start Date End Date Slot Start Slot End

Control 0 9Control ‐ ‐ 0 9

Item Based Jan 13 Feb 5 10 19

User Based Feb 5 Feb 26 10 14

Ensemble Feb 5 Feb 26 15 19

• Scoring

– Create recommendations for users in the test groupCreate recommendations for users in the test group

39


• Web Based Reports

– Test improvement relative to control

– KPI’s: Impressions, Clicks, Conversions

– Growth Equation: Revenue = PPC x CTR x Impressions

– Test performance over time

– Eye test / User detail

Test Percent Improvement

Revenue CTR PPC Volume

Control ‐ ‐ ‐ ‐

User Based +10% +20% +5% +0%

Ensemble +15% +30% ‐10% +5%

40

Generating Insights from Tests

Insights

• Design for accurate and diverse recommendations

• Recent behavior matters both in modeling and scoring

• Know your data

• Challenge the status quoDesignPredict

Recommendations

• Improvement comes through testing

• Communicate insights to the organization

ExecuteAnalyzeg g

• Have fun

41

C l iConclusion

42

Conclusion

• Recommender platforms can outperform popularity based approaches by double digits, create a competitive advantage, and enhance the user experience.

• Every organization is going to have their own unique implementation depending on technology and strategy. Platforms are built around a variety of data stores and leverage a variety of programming languages.

• A strong solution requires excellence in all four components (i.e. data, algorithm, production, and testing).

• Testing produces consumer insights and understanding that lead to constant improvement.

43

Nathan Stephens

Senior Manager

MerkleA l i

Senior Manager

[email protected]

Analytics Allen Dickson

Senior Manager

[email protected]

Discussion Questions

1. Are recommender platforms valuable to my organization? How would a recommender platform fit into my overall strategy?

2. How do I get started?

3. Which metrics could be good indicators of recommendation quality (click through rate, volume, attrition, etc.)?

4. How might a recommender system adapt to multiple users with the same account? F l f il i ht h th A t H ldFor example, a family might share the same Amazon account. How would an ensemble solution address this issue?

5 Why do recommender platforms require constant improvement? How does constant5. Why do recommender platforms require constant improvement? How does constant improvement benefit the organization?

45

Breakout Recommender Platforms FINAL slr - Merkle · Selena Gomez ‐Love You Like A Love Song 7....

Documents

Transcript of Breakout Recommender Platforms FINAL slr - Merkle · Selena Gomez ‐Love You Like A Love Song 7....