Big Data Real-Time Analytics in Erlang

32
Big Data Real-Time Analytics Knut Nesheim · @knutin GameAnalytics

description

Game Analytics is building the next-generation analytics platform for the gaming industry. We want to make advanced machine learning available to even the small studios with just three guys in a garage. We want to have all the analytics in real-time. We're building a real-time stream processing analytical engine in Erlang. As data comes in from our customers games, we process it and make it immediately available to our customers. In this presentation we will present the challenges we're facing, the architecture we have come up with and some really clever implementation details. We're preparing for massive volumes of data and need to be ready when big customers start using our product. Come see the challenges we're facing and how we're using Erlang to build the next-generation analytics platform.

Transcript of Big Data Real-Time Analytics in Erlang

Page 1: Big Data Real-Time Analytics in Erlang

Big Data Real-Time Analytics

Knut Nesheim · @knutinGameAnalytics

Page 2: Big Data Real-Time Analytics in Erlang

‣ What we do

‣ What we want

‣ Our solution

Topics

Page 3: Big Data Real-Time Analytics in Erlang

Me

‣ Klarna

‣ Wooga

‣ Game Analytics

‣ github.com/knutin

Page 4: Big Data Real-Time Analytics in Erlang

Analytics SaaS for games

Page 5: Big Data Real-Time Analytics in Erlang

Game Analytics

‣ Startup, venture funded, ~2 years old

‣ HQ in Copenhagen, engineering in Berlin

‣ Need to move fast

‣ Willing to take on technical debt

‣ Be ready for traffic growth

‣ Big games are big, millions of DAU

Page 6: Big Data Real-Time Analytics in Erlang

GA.Event.Business(“sheep”, “gold”, 200);

Page 7: Big Data Real-Time Analytics in Erlang

Metrics

‣ Daily Active Users, Monthly Active Users

‣ Revenue

‣ Histogram of event values

Page 8: Big Data Real-Time Analytics in Erlang

Idea: some real-time metrics

Suitable subset

Page 9: Big Data Real-Time Analytics in Erlang

Scope creep: real-time everything

GA v2.0

Page 10: Big Data Real-Time Analytics in Erlang

MapReduce architecture, v1.0

Page 11: Big Data Real-Time Analytics in Erlang

iOS SDK Android SDK Unity SDK

Collectors

HTTP

S3

MongoDB

Online

MapReduce

OfflineSpecial stuff

Website

Work queue

Page 12: Big Data Real-Time Analytics in Erlang

Streaming architecture, v2.0

Page 13: Big Data Real-Time Analytics in Erlang

iOS SDK Android SDK Unity SDK

Collectors

S3

HTTP

Stream DynamoDB

Query API

HistoryReal-time

Website Customers

Page 14: Big Data Real-Time Analytics in Erlang

2013-10-14 2013-10-15

DynamoDB RAM

Today

Page 15: Big Data Real-Time Analytics in Erlang

Streaming

‣ Partition on game

‣ One process per game, autonomous

‣ Prototype very promising

Page 16: Big Data Real-Time Analytics in Erlang

Implementation

Page 17: Big Data Real-Time Analytics in Erlang

Game process‣ Process files sequentially

‣ Keep running results in RAM

‣ Flush to DB when window closes

‣ Answer real-time queries

‣ Put all in one node

Page 18: Big Data Real-Time Analytics in Erlang

“Metric DSL”

Page 19: Big Data Real-Time Analytics in Erlang

HyperLogLog‣ Estimate cardinality

‣ Clever hash tricks

‣ Millions unique with <1% error in ~40k words

‣ Unions

‣ “hyper”: Erlang HLL++ from Google paper[0]

‣ Will open source Soon (TM)

[0]: “HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm”

Page 20: Big Data Real-Time Analytics in Erlang

Recordinality[0]

‣ Estimate frequency of values

‣ User session count

‣ Keeps a reservoir, clever hash tricks

‣ Will open source Soon (TM)

[0]: “Data Streams as Random Permutations: the Distinct Element Problem”

Page 21: Big Data Real-Time Analytics in Erlang

Next step: More parallelization

Page 22: Big Data Real-Time Analytics in Erlang

loop(State) -> NewState = process(next_file(), State), loop(NewState).

Game #123

Page 23: Big Data Real-Time Analytics in Erlang

Worker #1

Part = process(next_file(), empty()),game_123 ! Part.

loop(State) -> receive Part -> NewState = merge(Part, State)), loop(NewState) end.

Game #123

Page 24: Big Data Real-Time Analytics in Erlang

Worker #1

Game #123

Worker #2 Worker #N

Manager

Page 25: Big Data Real-Time Analytics in Erlang

Mapper #1

Reducer #123

Mapper #2 Mapper #N

Scheduler

Page 26: Big Data Real-Time Analytics in Erlang

Scheduler

‣ Take ideas from Hadoop, Riak Pipe

‣ Manage limited resources

‣ Distribute work across nodes

‣ Colocate processing with state storage

Page 27: Big Data Real-Time Analytics in Erlang

Implementation

‣ DIY?

‣ riak_core for state processes?

‣ riak_pipe for managing work?

Page 28: Big Data Real-Time Analytics in Erlang

Conclusion

Page 29: Big Data Real-Time Analytics in Erlang

Erlang: The bad parts

‣ Big process state not great

‣ Lots of updates, lots of garbage

Page 30: Big Data Real-Time Analytics in Erlang

Erlang: The good parts

‣ Total allocated RAM >30GB

‣ Per process heap is gold

‣ Easy parallelization, distribution

‣ Looking forward to maps!

Page 31: Big Data Real-Time Analytics in Erlang

Questions?

Knut Nesheim · @knutinGameAnalytics