Spil games konrad
-
Upload
bigdataexpo -
Category
Data & Analytics
-
view
237 -
download
0
Transcript of Spil games konrad
Going data-driven Learnings from building a real-time recommender system
Konrad Burnik
September 21, 2016
Spil Games – Leading cross platform publisher
Web Portals
Spil Games - Web portal stats
• Portfolio of cca. 16K games
• 100 million monthly active users
• Channels: Family, Teen, Girls, Men
• Device Type: Desktop or Mobile
Spil Games getting ready for the Big Data world
This looks great! So, what's our
first ML project?
Just look around ...
Example: Distribution of Games within labels at spelletjes.nl
Long tail you got there!
"The" widget for recommendations
Goals and challenges
• Provide better content for the users
• Optimize recommendations for business value
• Provide recommendations for new users
• Learn to use the new Spark infrastructure for solving all of the above
Overview of the recommender system
• The infrastructure (before and after)
• Two key components of the new recommender
• Ephemeral (effectively solving the cold-start problem)
• Collaborative Filtering
s
Spil Games Recommender Infrastructure (before)
s
Streaming
MLlib
Spil Games Recommender Infrastructure (after)
• For users which have some activity
• In particular, we wish to target the users which came to the portals and played just a few games
Ephemeral Recommender
Ephemeral Recommender (challenges)
• What data can we use besides activity?
• How do we keep track of users?
• How do we quickly generate the recommendation lists?
Ephemeral Recommender (key features)
• The ephemeral recommender is game-similarity based
• Exploiting the long-tail
• Also we show games which have more business value for Spil Games for example with sufficient amount of lifetime value
• Processing 800-1500 events per second
Action Puzzle
Example:
Action Puzzle
+1
+1
+1
Streaming
For You
Action Puzzle
+1 +1
+1
For You
Streaming
Action Puzzle
+1
+1
+1
For You
Streaming
Action Puzzle
+1
+1
+1
For You
Streaming
• For users which have history of their activity
• Proven to work by different companies like Amazon, Netflix, …
Collaborative Filtering
Collaborative Filtering in general
* * ? ? * * * *
? * * * * * * ?
* * * * * * * ? * * * * *
* * * * ? * * * * * * *
Collaborative Filtering in general
* * ? ? * * * *
? * * * * * * ?
* * * * * * * ? * * * * *
* * * * ? * * * * * * *
Can we predict the empty
places?
Collaborative Filtering in general
* * * * * * * * * * * *
* * * * * * * * * * * *
* * * * * * * * * * * * * *
* * * * * * * * * * * * * * *
Great! But how do we get the highest ratings
out?
Collaborative Filtering in
Image obtained from databricks.com
MLlib
Collaborative Filtering (challenges)
• How do we aggregate the activity data?
• How do we score the data and scale it?
• Which users do we run the model on?
• How do we efficiently extract the recommendations from the model?
Collaborative Filtering recommender (key features)
• Aggregating every hour of user activity for the last hour (~1.5 - 5 mil. rows) takes about 2 minutes
• Calculating the model based on a month of scored and scaled pre-aggregated activity takes about 1 hour
• We run the model only for user which were active in the last 5 hours
• Extracting the recommendations takes about 30 mins with optimized approach
Family Teens Girls Men
Desktop 68 894 434 16 070 864 31 285 329 679 565
Mobile 2 532 549 404 934 1 276 879 2 249
# total records
Family Teens Girls Men
Desktop 16 127 074 5 254 646 5 022 497 357 721
Mobile 1 035 520 221 192 397 091 1 240
# distinct users
Family Teens Girls Men
Desktop 15 078 11 764 7 736 3 171
Mobile 3 151 5 532 1 792 467
# distinct games
Data amounts processed by CF
Results
• The deployment system in place for developing Spark
apps
• Gained knowledge of using Spark infrastructure
• Gained knowledge of inner workings of recommenders as well as some related cutting-edge research
• Significantly improved the CTR of the "For You"
widget in the two months the recommender is live
What have we learned?
• Giving recommendations is hard!
• Simple solutions often work best
• Exploring the long-tail is a good thing for diversification
• Spark is not that simple as hyped, you often need to tweak a lot!
Thank you
for your attention!
Contact: https://nl.linkedin.com/in/konrad-burnik