Download - Big data spain 2013 - ad networks analytics

Transcript

Page 1: Big data spain 2013 - ad networks analytics

Ad Networks analytics using Hadoop and Splout SQL

Iván de Prado Alonso – CEO of Datasalt www.datasalt.es @ivanprado @datasalt

Page 2: Big data spain 2013 - ad networks analytics

Big Data consulting & training

Page 3: Big data spain 2013 - ad networks analytics

Agenda

1.   Analy,cs for Ad Networks 2.   Our solu,on

1.  Hadoop + Splout SQL 2.  Splout SQL in detail 3.  Pre-‐aggregaFons v.s. Sampling

3.   Conclusions

Page 4: Big data spain 2013 - ad networks analytics

Analy,cs for Ad Networks

Ad Networks

" Principal agents ›  AdverFser ›  Publisher

• Web pages •  Mobile apps

" Ad Network ›  Network of agents that mediate between

adverFsers and publishers ›  DSPs, SSPs, DMPs, ADTs, ITDs, etc

Page 6: Big data spain 2013 - ad networks analytics

For the sake of simplicity...

" Let’s consider a monolithic Ad Network ›  Single agent between adverFsers and publishers

" But the exposed solu,on is also useful for DSPs, SSPs, DMPs, etc.

Page 7: Big data spain 2013 - ad networks analytics

Need for analy,cs

" For adver,sers ›  Monitoring campaigns ›  Improve ROI

" For publishers ›  Improve ad placement

" But there can be ›  Tens of thousands of adverFsers ›  Hundred of thousands of publishers

Page 8: Big data spain 2013 - ad networks analytics

Analy,cs

" Coun,ng impressions, clicks and CPC ›  For a given range of dates ›  Filtered by

•  Campaign •  LocaFon •  Language •  Browser/device •  Ad type •  ... or any combinaFon of the above!

Page 9: Big data spain 2013 - ad networks analytics

Two-‐fold usage

" Opera,onal ›  For invoicing, accounFng, etc. ›  Limited set of parameter variaFons

•  Fixed date ranges and common aggregaFons

›  Exact results expected

" Exploratory ›  Unlimited variaFons of parameters

•  Ad-‐hoc filtering ›  Approximated results are enough

Page 10: Big data spain 2013 - ad networks analytics

Challenges

" Billions of events and hundreds of gigabytes per day ›  Need for a distributed system

" Query flexibility ›  Need to cope with operaFonal and exploratory

queries

" Web latencies ›  Queries must return in milliseconds

Page 11: Big data spain 2013 - ad networks analytics

Exploding

" Data needed to serve analy,cs panels is Big Data ›  Thousands of adverFser panels ›  Even more for publisher panels

" But individually each agent panel can be served with one machine ›  At least for the 98% of adverFsers/publishers ›  Horizontal parFFoning is a good strategy

Page 12: Big data spain 2013 - ad networks analytics

Our solu,on

Page 13: Big data spain 2013 - ad networks analytics

Our solu,on

Page 14: Big data spain 2013 - ad networks analytics

Hadoop

" Scalable ›  Storage of raw data ›  CompuFng capabiliFes

" Good for ›  CreaFng pre-‐computed aggregaFons (views) ›  GeneraFng samples of data

" Bad for ›  Serving data ›  On-‐line aggregaFons

Page 15: Big data spain 2013 - ad networks analytics

" Scalable ›  Serving of full SQL queries (unlike NoSQLs)

" Good for ›  Ad-‐hoc aggregaFons over pre-‐computed views ›  Serving low-‐latency web pages with concurrency

Page 16: Big data spain 2013 - ad networks analytics

A well-‐balanced solu,on

" Hadoop ›  Provides a scalable repository for impressions ›  Performs off-‐line pre-‐aggregaFons and sampling

" Splout SQL ›  Serves queries ›  Performs on-‐line aggregaFons in sub-‐second

latencies •  Each parFFon contains only data for a few agents, which ensures performance

Page 17: Big data spain 2013 - ad networks analytics

Splout SQL (in detail)

Page 18: Big data spain 2013 - ad networks analytics

Splout SQL in detail

Isola,on between genera,on and serving

Page 19: Big data spain 2013 - ad networks analytics

Splout SQL Architecture

Page 20: Big data spain 2013 - ad networks analytics

IMPRESSIONS

PID AID Amount

S100 U20 102

S101 U20 60

Tablespace T_ADVERTISERS ADVERTISERS

AID Name

U20 Doug

U21 Ted

U40 John

IMPRESSIONS

PID AID Amount

S100 U20 102

S101 U20 60

S223 U40 99

table ADVERTISERS

table IMPRESIONS

Generate tablespace T_ADVERTISERS with 2 parFFons for

parFFoned by CID

ParFFon U10 – U35

ParFFon U36 – U60

ADVERTISERS

AID Name

U40 John

IMPRESSIONS

PID AID Amount

S223 U40 99

Genera,on

ADVERTISERS

AID Name

U20 Doug

U21 Ted

Page 21: Big data spain 2013 - ad networks analytics

API -‐ Genera,on Command line Loading CSV files

Java API

HCatalog

$ hadoop jar splout-*-hadoop.jar generate …

Hive Pig

Page 22: Big data spain 2013 - ad networks analytics

SELECT Name, sum(Amount) FROM ADVERTISERS a, IMPRESSIONS i WHERE a.AID = i.AID AND AID = ‘U20’;

For key = ‘U20’, tablespace=‘T_ADVERTISERS’

ParFFon U10 – U35

Serving

ParFFon U36 – U60

ADVERTISERS

AID Name

U20 Doug

U21 Ted

IMPRESSIONS

PID AID Amount

S100 U20 102

S101 U20 60

ADVERTISERS

AID Name

U40 John

IMPRESSIONS

PID AID Amount

S223 U40 99

Page 23: Big data spain 2013 - ad networks analytics

SELECT Name, sum(Amount) FROM ADVERTISERS a, IMPRESSIONS i WHERE a.AID = i.AID AND AID = ‘U40’;

For key = ‘U40’, tablespace=‘T_ADVERTISERS’

Serving

ParFFon U36 – U60 ParFFon U10 – U35

ADVERTISERS

AID Name

U20 Doug

U21 Ted

IMPRESSIONS

PID AID Amount

S100 U20 102

S101 U20 60

ADVERTISERS

AID Name

U40 John

IMPRESSIONS

PID AID Amount

S223 U40 99

Page 24: Big data spain 2013 - ad networks analytics

API -‐ Service Rest API

JSON response

Page 25: Big data spain 2013 - ad networks analytics

API -‐ Console

Page 26: Big data spain 2013 - ad networks analytics

Pre-‐aggrega,ons v.s.

Sampling

Page 27: Big data spain 2013 - ad networks analytics

Opera,onal usage

" Invoicing, accoun,ng, monitoring, etc. ›  Exact results ›  Constrained space of aggregaFons

" Pre-‐computed aggregates done in Hadoop ›  For example:

•  per day •  per day per locaFon

" Extended aggrega,ons done on-‐line ›  Using Splout SQL ›  For example, aggregate per week based on daily

stats

Page 28: Big data spain 2013 - ad networks analytics

Why not to pre-‐compute everything?

" Create one table per each dimension combina,on ›  For two dimensions (day, locaFon):

•  day •  locaFon •  locaFon, day

" For n dimensions ›  2n – 1 combinaFons ›  It explodes!

Page 29: Big data spain 2013 - ad networks analytics

Exploratory usage

" Ad-‐hoc filters to learn from data ›  Approximated results are enough

" Intensive use of sampling ›  It can provide good accuracy with fast response

" Confidence interval ›  p=proporFon ›  n=sample size ›  z=normal distribuFon

p± z! /2p ! (1" p)

Page 30: Big data spain 2013 - ad networks analytics

Samples

" Created on Hadoop ›  Different sample sets

•  For last X days •  For last year

" Splout SQL for serving them •  On-‐line analyFcs over samples •  1 Million records per second* (44 bytes per row) •  Faster with data in memory

ü  Warming data prior use ü  2.7 Million records per second*

* Measured in a laptop

Page 31: Big data spain 2013 - ad networks analytics

Pre-‐aggrega,ons pros & cons

" Advantages ›  Exact results ›  Good for exploring the long-‐tail

" Limita,ons ›  Only for a constrained amount of aggregaFon

combinaFons ›  Not good for exploratory analysis

Page 32: Big data spain 2013 - ad networks analytics

Sampling pros & cons

" Advantages ›  Fast filtering for any set of dimensions ›  Good accuracy for Top N queries

" Limita,ons ›  Bad for narrow dimension filters ›  Bad for exploring the long-‐tail ›  Approximated results

Page 33: Big data spain 2013 - ad networks analytics

Conclusions

Page 34: Big data spain 2013 - ad networks analytics

Conclusions

" Analy,cs in Ad Networks is a complex ques,on ›  Due to the amount of data ›  Due to the amount of agents

" It can be solved using Hadoop + Splout SQL ›  By the use of parFFoning ›  Using pre-‐aggregaFons

•  For operaFve usages ›  Using sampling

•  For exploratory profiles

Page 35: Big data spain 2013 - ad networks analytics

Questions?

Iván de Prado Alonso – CEO of Datasalt www.datasalt.es @ivanprado @datasalt

Top Related

Media report IVQ 2015. Google Analytics (analytics to 28 ...

무선네트워크환경 Analytics - Cisco · Analytics Analytics Technology and 신기술 Orchestration도입, 자동화,운영간소화 비즈니스혁신을위한기여 비즈니스가시성확보

Hloubková analýza výrobního procesu · Splunk Analytics Online Services Web Services Security GPS Location Storage Desktops Networks Packaged Applications Custom Applications

HR Analytics/People Analytics Prof João Lins FGV PWC

Lezione2 Analytics, google analytics

HEDNA LA Attendees€¦ · Paul Anthony Hotelbeds Spain SLU Benjamin Pironneau Hotelbeds Spain SLU Mark Redmond Hotelbeds Spain SLU Annarosa Tassan Got Hotelbeds Spain SLU ... Rogelio

Google Analytics 360 vs. Google Analytics - Trakken...DIGITAL ANALYTICS Google Analytics 360 vs. Google Analytics Google Analytics 360 ehemals „Google Analytics Premium “ ist seit

Fujitsu Agile Analytics Enabling rapid, agile analytics ... · Fujitsu Agile Analytics Enabling rapid, agile analytics that drives valuable data insight In the information age we