Optimization of digital marketing campaigns

15
Optimization of Digital Marketing Campaigns Armando Vieira, Inesting Abstract In this work we apply several clustering, visualization and predictive machine learning techniques to analyse data from digital marketing campaigns. For data exploration we used unsupervised techniques like kmeans, Principal Component Analysis (PCA), Multidimensional Scaling (MDS) and SelfOrganized Maps (SOM). We identified patterns that help the analyst understand the vast amount of data produced by digital trails and guide their actions (actionable insights). Support Vector Machines and Random Forest algorithm were used for supervised learning of conversions prediction. Keywords: ad optimization, Adwords, Predictive Analytics, SEO, digital marketing 1 Introduction Online advertising has evolved into a $50 billion industry and continues to grow by double digits. On the other hand, powerful web analytic tools, such as Google Analytics, Facebook Insights or Kissmetrics, provide key data easily available to anyone who wants to monitor the performance of their campaigns online. For ecommerce sites, the analyst has the ability to track every single action of the visitor over the conversion path and answer the fundamental questions: who, what, why, how and when, from a lead to the purchase. Our interest lies in monitoring the impact campaigns have on website traffic, engagement and revenue (in the case of ecommerce). A principal form of online advertising is the promotion of products and services through searchbased advertising. Today’s most popular searchbased advertising platform is Google Adwords, having the largest share of revenues. Search remains the largest online advertising revenue format, accounting for 46.5% of 2011 advertising revenues, up from 44.8% in 2010. In 2011, Search revenues totalled $14.8 billion, up almost 27% from $11.7 billion in 2010. This gives an unprecedent power to the marketing team but at a cost: the huge amounts of unstructured, disparate and complex data to be processed and parameters to be adjusted. The effort required to deal with the number of options and configurations for optimal performance of a company website is simple far beyond human capabilities. Furthermore some parameters have nonlinear interactions: for instance the quality of the SEO boosts the position of the Ad in Adwords campaigns, thus achieving a better performance for a lower PPC. The budget allocated to the campaign also influences the Ad position. There are even subtler influences and nuances when measuring the ROI. For instance, it is known that although display advertising brings very little direct sales, it may boost the performance of search Ads since users where previously exposed to the product or brand. To optimize this myriad of parameters we need to rely on machine learning algorithms to extract actionable insights and answers some simple questions like: how to improve my return on investment (ROI)? How to boost costumer engagement? What product generate most interest? What catalysis sales? What strategy to opt? What channels to choose? How much should I invest? When, how? These are very important question with no clear a single answer. Most of them depend on each case, and some are two vague to be answered. Under these circumstances, the safe strategy starts by design carefully an ad, select adequate keywords, set the bids, segment the campaign properly and test continuously for

description

Draft paper on using predictive analytics to optimize digital marketing campaigns

Transcript of Optimization of digital marketing campaigns

Page 1: Optimization of digital marketing campaigns

 

Optimization  of  Digital  Marketing  Campaigns    

Armando  Vieira,  Inesting    Abstract  In   this   work   we   apply   several   clustering,   visualization   and   predictive   machine   learning  techniques   to   analyse   data   from  digital  marketing   campaigns.   For   data   exploration  we   used  unsupervised  techniques  like  k-­‐means,  Principal  Component  Analysis  (PCA),  Multidimensional  Scaling   (MDS)   and   Self-­‐Organized  Maps   (SOM).  We   identified   patterns   that   help   the   analyst  understand   the   vast   amount   of   data   produced   by   digital   trails   and   guide   their   actions  (actionable   insights).   Support   Vector  Machines   and   Random   Forest   algorithm  were   used   for  supervised  learning  of  conversions  prediction.      Keywords:  ad  optimization,  Adwords,  Predictive  Analytics,  SEO,  digital  marketing    

1 Introduction  Online   advertising   has   evolved   into   a   $50   billion   industry   and   continues   to   grow   by   double  digits.   On   the   other   hand,   powerful   web   analytic   tools,   such   as   Google   Analytics,   Facebook  Insights  or  Kissmetrics,  provide  key  data  easily  available  to  anyone  who  wants  to  monitor  the  performance   of   their   campaigns   online.   For   e-­‐commerce   sites,   the   analyst   has   the   ability   to  track  every  single  action  of  the  visitor  over  the  conversion  path  and  answer  the  fundamental  questions:  who,  what,  why,  how  and  when,  from  a  lead  to  the  purchase.    

Our   interest   lies   in   monitoring   the   impact   campaigns   have   on   website   traffic,  engagement  and  revenue  (in  the  case  of  e-­‐commerce).    A  principal  form  of  online  advertising  is  the   promotion   of   products   and   services   through   search-­‐based   advertising.   Today’s   most  popular   search-­‐based   advertising   platform   is   Google   Adwords,   having   the   largest   share   of  revenues.  Search  remains  the  largest  online  advertising  revenue  format,  accounting  for  46.5%  of  2011  advertising  revenues,  up  from  44.8%  in  2010.  In  2011,  Search  revenues  totalled  $14.8  billion,  up  almost  27%  from  $11.7  billion  in  2010.    

This   gives   an   unprecedent   power   to   the   marketing   team   but   at   a   cost:   the   huge  amounts  of  unstructured,  disparate  and  complex  data  to  be  processed  and  parameters  to  be  adjusted.   The   effort   required   to   deal   with   the   number   of   options   and   configurations   for  optimal  performance  of  a  company  website  is  simple  far  beyond  human  capabilities.  

Furthermore  some  parameters  have  non-­‐linear  interactions:  for  instance  the  quality  of  the   SEO   boosts   the   position   of   the   Ad   in   Adwords   campaigns,   thus   achieving   a   better  performance   for   a   lower   PPC.   The   budget   allocated   to   the   campaign   also   influences   the   Ad  position.  There  are  even  subtler  influences  and  nuances  when  measuring  the  ROI.  For  instance,  it   is   known   that   although   display   advertising   brings   very   little   direct   sales,   it  may   boost   the  performance  of  search  Ads  since  users  where  previously  exposed  to  the  product  or  brand.  

To  optimize  this  myriad  of  parameters  we  need  to  rely  on  machine  learning  algorithms  to   extract   actionable   insights   and   answers   some   simple   questions   like:   how   to   improve  my  return   on   investment   (ROI)?   How   to   boost   costumer   engagement?    What   product   generate  most   interest?  What   catalysis   sales?  What   strategy   to   opt?  What   channels   to   choose?  How  much  should   I   invest?  When,  how?  These  are  very   important  question  with  no  clear  a  single  answer.  Most  of  them  depend  on  each  case,  and  some  are  two  vague  to  be  answered.  

Under   these   circumstances,   the   safe   strategy   starts   by   design   carefully   an   ad,   select  adequate   keywords,   set   the   bids,   segment   the   campaign   properly   and   test   continuously   for  

Page 2: Optimization of digital marketing campaigns

fine-­‐tuning.  If  results  are  not  as  expected,  then  look  at  the  data,  learn,  make  corrections,  and  repeat  the  cycle.     Most  the  research  have  been  focused  on  the  publisher  side,  trying  to  device  strategies  to  maximize  the  CTR  of  Ads,  by  means  of  content  contextualization,  ads  personalization  among  others  [**].  In  this  work,  however,  we  take  the  perspective  of  the  advertiser  and  will  explore  the   potential   of   machine   learning   tools   for   prediction   and   optimization   of   the   marketing  strategy.  The  objective  is  to  maximize  performance  and  effectiveness  of  marketing  campaigns,  namely   the   Return   On   Investment   (ROI).  We   propose   a   system   to   extract   information   from  Google  Analytics  and  determine  the  most  important  for  optimization.    

The  article  is  organized  as  follows.  In  section  2  we  introduce  the  data  and  pre-­‐processing.  In  section  3  we  explore  the  data  and  extract  relevant  features  using  clustering  algorithms,  like  k-­‐means,  PCA  and  MDS  and  SOM.  In  Section  4  we  introduce  the  supervised  learning,  where  we  predict  Conversions,  Revenues  and  user  engagement.  Finally  in  section  6  some  conclusions  are  drawn.  

2 Data    

2.1 Data  Extraction  and  description    Data  was  collected  from  a  costumer  running  campaigns  on  an  ecommerce  site  with  Adwords  campaign,  Facebook  and  email  marketing.  Data,  collected  on  a  daily  frequency  over  a  period  of  6   months,   is   described   in   Table   1.   Our   main   data   sources   are   Google   Analytics   (GA)   -­‐   that  aggregate  data  from  Google  Adwords  -­‐  and  Facebook  Insights.  We  focused  on  inputs  that  may  give  us  access  to  insights,  namely  correlations  between  conversions  and  site  usage  or  Adwords  campaigns.    

We   used   the   package   RGoogleAnalytics   (RGA)   to   extracted   data   into   R   from  Google  Analytics.  We   collected   data   from  Adwords,   Facebook   and   email   campaigns   -­‐   Table  1.  Data  was  collected  over  different  timeframes  and  consolidated  by  date.  For  some  cases,  data  was  decomposed  by  traffic  source  in  GA,  and  by  group  segment  as  in  case  of  Adwords,  so  each  data  point   corresponds   to   a   specific   segment  on   a   specific   day.   Two  data   set  were  build:  Data  1:  with  just  adwords  other  with  analytics+facebook+email:  Data  2.    

Table  1  variables  used  for  analysis.  The  colour  fields  are  data  from  campaigns.  

  Variable   Name  (Metric/Dimension)  

Comments  

Traffic  source     TO  (D)   Organic,  Email,  Adwords,  Facebook,  Others  

Visit  length   VL  (M)    Number  of  visits   NV  (M)    Bounce  rate   BR  (M)    

General  

Page  per  visit   PV  (M)    Ad/campaign  group   CG  (D)   Group  of  Ad  Cost  per  Click   CPC  (M)    Position   P  (M)    Type     T  (D)   Search,  display  Click  Through  Rate   CTR  (M)    

AdWords  

Conversion  Rate   CRA  (M)    

F a c e b o o k  

Impressions   Imp  (M)    

Page 3: Optimization of digital marketing campaigns

Click  through  rate   CTRf(M)    Cost  per  like   CPL  (M)    

 

Convertion  Rate  Facebook   CRF(M)    

Emails  Sent   Em  (M)    Open  Rate   OR  (M)    Click  Rate   CT  (M)    

Emails  

Conversion  Rate  email   CRE  (M)       Total  revenue   Re  (M)   Revenue  from  sales      

2.2 Performance  Ratios  For   visualization   proposes,   we   consider   several   aggregated   metrics   to   benchmark   the  performance   of   a  website   and   the   digital   campaigns.  We   divide   the  metrics   into   two  major  categories:  website  usability  and  financial  performance.  All  indexes  are  defined  to  have  values  between  0  and  1.    

A  site  can  be  highly  engaging…  

 

Website  usability  metrics  We  defined  the  engagement  as  a  composite  index,  defined  according  to  [8]  as:    

E = Cdi +Ddi+Idii∑ + (1− Bri)  

where  Br  is  the  bounce  rate  and  the  other  indices  are  defined  below.  The  sum  runs  over  any  aggregation  metric   that   we  may   be   interested.   The   coefficients   are   obtained   from   sessions  originated   from  a  particular  dimension:  visitor   id,   traffic  source,   time,  etc.  This   index  has   the  advantage  of  benchmarking  the  quality  of  the  site  and  the  interaction  of  user  with  the  content.    

Click  Depth  index  (Cd)  measures  the  degree  depth  visits  and  is  defined  as:    

Cd =Sessions with at least 4 page views

All sessions  

 Duration  Depth  index  (Dd)  measures  the  intensity  of  the  visits  captured  by  the  

duration  of  visits  on  the  website.  It  is  defined  as:  

Page 4: Optimization of digital marketing campaigns

 

Dd =Sessions with a duration of at least 3 min

All sessions  

 The   Interaction   depth   index,   (Id),   captures   the   visitor   interaction   with   content   or  

functionality  designed  to  increase  level  of  Attention.  It  is  defined  as:    

Id =Sessions where visitors complete an action

All sessions  

 where  an  action   can  be  defined  as  a  goal  on  GA,   from  downloading  a  document,   to   filling  a  form  or  watching  a  video.    Financial  metrics  Engagement  with   a  website   is   important,   but   the   really   important  metrics,   especially   for   e-­‐commerce  sites,  are  sales  or  leads.  This  is  captured  by  financial  metrics  ratios.     There  are  dozens  of   financial   ratios   to  measure  efficiency  of  a   sales   channel,  but  we  will  focus  on  the  following:      

• CR,  Conversion  Rate  • RPC,  Revenue  Per  Channel  • ROI,  Return  On  Investment  

 The  CR  rate  is  simple  defined  as:  

 

CR =Sessions where visitors purchage a produt

All sessions  

 Typical  CR  are  low,  1%  is  considered  very  good  for  most  sites,  but  it  can  be  as  low  as  0.001%.    

The  Revenue  per  channel  (RPC)  is  the  total  value  earned  by  a  sales  channel  over  a  fixed  period  of  time.  The  ROI  of  a  channel  is  simply  the  ratio  of  revenue  per  total  investment  made  on  this  channel:    

ROI =RPC

Total cost  

 In   Figure  1  we   show   the  evolution  of   Engagement  and  ROI  over   time   for   the  2  mains   traffic  origin  sources.    

Page 5: Optimization of digital marketing campaigns

 Figure  1:  Engagement  over  time  (days)  for  using  a  moving  average.  

   

In   Figure  2,  we  plot   the   revenue  per  origin  of   traffic.   The  most   important   source   for  revenue   was   Facebook,   while   Google   Organic   ranks   second   and   Adwords   third.   The   most  consist  channels  are  Direct  traffic  and  email  newsletter.  

 Figure  2:  revenue  distribution  per  channel  (top  6).  

3 Data  visualization  with  unsupervised  techniques  In  this  section  we  will  use  some  techniques  for  data  exploration  and  visualization   in  order  to  detect   patterns   and   features   that   are   hidden   in   high   dimensional   data.   We   will   use   non-­‐supervised  clustering  techniques,  from  simpler  ones,  like  k-­‐means,  to  more  elaborate  one,  like  

Self  Organized  Maps  (SOM)  and  Multi  Dimensional  Scaling  (MDS).  

 

Page 6: Optimization of digital marketing campaigns

3.1 Adwords  Data  We  start  by  characterizing  the  data  by  plotting  the  box  plots  in  Figure  3  where  the  number  of  conversions,  the  CTR  and  CR  are  displayed  for  all  Adgroups   in  our  campaign.  There  are  three  Ad  groups  that  have  the  majority  of  conversions  (sales):  group  9,  10  and  11.  The  average  CTR  is  almost  constant  for  most  of  the  groups  (around  6%),  but  in  some  cases  we  don’t  have  enough  data   to  evaluate   it  with  accurately.  The  average  position   is  1.68  and   the  average  CR   is  0.2%,  showing  a  greater  variability  than  the  CTR.          

 

Figure  3:  Boxplot  of  CTR   (red),  number  of   conversions   (blue)  and  CR   (green)   for  all  Adwords  groups      

In  Figure  4  we  plot   the  weekly   revenues  and  costs  over  a  period  of  6  months  of   the  adwords  campaign.    Initally  the  campaign  was  not  very  efficient  since  we  run  a  trial  period  to  test  and  optimized   its   content,   targeting  and  keywords.  After  week  6  a  boost  on   investment  also  bring  a  more  than  propotional  increase  in  sales.      

Page 7: Optimization of digital marketing campaigns

 

Figure  4:  Revenue  and  cost  per  week  on  Adwords  campaigns.  

 Clustering  We   then   cluster   the   data   using   the   k-­‐means   algorithm.   K-­‐means   is   one   of   the   simplest   and  most   widely   used   algorithm   for   non-­‐supervised   clustering.   The   only   input   is   the   number   of  clusters   k   and   the   metric   used   to   calculate   the   distances   between   points.   We   tested   the  algorithm   from   two   to   five   clusters   using   the   Euclidian   distance   on   the   Adwords   data.   The  optimum  compromise  between  intra  and  inter  cluster  distance  was  achieved  at  k  =  3  clusters.  Results  are  presented  in  Figure  5  where  we  selected  the  dimensions  CTR  and  number  of  Clicks  as   representative   axis.   The   four   patterns   are   very   clear   in   this   figure   and   the   centroids   are  presented  in  Table  2.  It  can  be  seen  that  most  conversions  come  from  the  green  group,  which  corresponds  to  the  greater  number  of  visits  and  clicks.  Number  of  page  visits   is  also  a  strong  indicator  of   revenue.  Error!  Reference  source  not   found.   show  the  clustering  on  page  views  and  visitors.  CTR,  CPC  and  position  are  almost  the  same  for  the  three  groups.  

 Figure  5:  K-­‐means  algorithm  with  3  clusters  for  data  set  1.  

Page 8: Optimization of digital marketing campaigns

 Table  2:  Centres  of  the  4  clusters  obtained  by  kmeans  for  the  Adwords  data  set  

Cluster   Cost   Clicks   Imp.   Revenue   CTR(%)   CPC   Position  

1   56.7 327 4739 85.1 0.07 0.14 1.79  

2   81.7 474 6610 124.9 0.08 0.15 1.71  

3   20.8 73 1194 14.1 0.06 0.17 1.30  

     

In  Figure  6  we  plot  the  structure  of  Graph  of  correlations  with  R  function  qgraph  for  the  Adwords  data  set.  There  are  strong  correlations  between  **.???  

   Figure  6  correlations  with  QGrapph  

 

3.2 PCA  Principal  Component  Analysis  is  one  of  the  oldest  and  wider  used  approaches  to  compress  high  dimensional  data  into  a  sub-­‐set  of  linear  components.  It  has  the  disadvantage  of  being  a  linear  model,  but  it  still  very  useful.   In  Figure  7  we  plot  the  eigen-­‐values  of  the  components  in  a  bi-­‐dimensional  plot.  Two  main  principal  components  are  clearly  seen.  Note  that  conversions  are  highly  correlated  with  ad  groups.  

Page 9: Optimization of digital marketing campaigns

 

 Figure  7  PCA  for  the  Adwords  (left)  data  and  Google  Analytics  (right).  

 

3.3 SOM  Self-­‐organizing  map  (SOM)  is  an  unsupervised  neural  network  proposed  by  Kohonen  (Kohonen  2001)   for   visual   cluster   analysis.   The   neurons   of   the   map   are   located   on   a   regular   grid  embedded   in   a   low   (usually   2   or   3)   dimensional   space,   and   associated   with   the   cluster  prototypes.   In  the  course  of   learning  process,   the  neurons  compete  with  each  other  through  the  best  matching  principle,   i.e.,  the  input  is  projected  to  the  nearest  neuron  using  a  defined  distance  metric.  The  winner  neuron  and  its  neighbours  on  the  map  are  adjusted  towards  the  input  in  proportion  with  the  neighbourhood  distance,  consequently  the  neighbouring  neurons  likely   represent   the   similar   patterns  of   the   input  data   space.  Due   to   the  data   clustering   and  spatialization  through  the  topology  preserving  projection,  SOM  is  widely  used  in  the  context  of  visual  clustering  applications.      

SOM  is  very  appropriate  to  analyze  the  high-­‐dimensional  data  of  digital  metrics  range  of   research   groups   concentrate   on   the   bankruptcy   prediction   problem,   usually   solved   as   a  classification   task   to   separate   the  companies   into  distress  and  healthy  category   (binary)  or  a  number  of  predefined  credit  rates  (multi-­‐class).    

SOM  is  used  to  determine  the  class  through  a  visual  exploration  (Merkevicius,  Garsva  &  Simutis  2004).  An  enhanced  version  of  LVQ  can  boost  the  prediction  performance  of  multi-­‐layer   perceptron   neural   network   (Neves   &   Vieira   2006).   In   cooperation   with   independent  component  analysis  for  dimensionality  reduction,  LVQ  is  employed  to  recognize  the  distressed  French  companies  (Chen  &  Vieira  2009).  

Page 10: Optimization of digital marketing campaigns

 Figure  8:  SOM  for  data  set  1  –  Adword  campaigns  on  a  6x5  =  30  cells  space.  

3.4 MDS  SOM   methods,   presented   previously,   involves   the   estimation   of   the   conditional   probability  which   is   computationally   expensive   and   hard   to   extract.   Here  we   test   the  Multidimensional  Scaling   algorithm   (MDS).   MDS,   is   a   non-­‐linear   approach,   mostly   used   for   visualizing,   that  captures   the   level   of   similarity   of   individual   cases   of   a   dataset.   It   is   used   to   display   the  information  contained   in  a  distance  matrix,  evaluated  according  with  some  metric.  The  MDS  algorithm  place   each   object   in  N-­‐dimensional   space   such   that   the   between-­‐object   distances  are  preserved  as  well   as  possible.   Each  object   is   then  assigned   coordinates   in   each  of   the  N  dimensions.  The  number  of  dimensions  of  an  MDS  plot  N  can  exceed  2  and  is  specified  a  priori.  Choosing  N=2  optimizes  the  object  locations  for  a  two-­‐dimensional  scatterplot  -­‐  Figure  9.  

 Figure  9:  Aggregation  by  MDS  on  data  set  2.  Colours  represents  revenues  levels  (black  =  lowest,  light  blue  =  

highest).  

 

3.5 Heatmaps  and  ROI  We  now   investigate  the  return  on   investment   (ROI)   from  Adwords  and  Facebook  campaigns.  The  Facebook  campaign  run  over  the  same  period  as  the  Adwords  with  a  daily  budget  between  

Page 11: Optimization of digital marketing campaigns

10  and  40  euros  -­‐  Figure  10.  The  ROI  is  in  general  bigger  than  1,  meaning  that  the  campaign  is  producing  good  results.  We  we  consider  the  global  performance  (Sales  originated  from  all   channels)   the   ROI   almost   duplicate   –   considering   as   cost   only   the   investment   in  Adwords  and  Facebook.  

 

Figure  10  :  ROI  over  time  (days)  -­‐  using  moving  averages:  (red)  Adwords,  (blue)  Total.  

 We  now  plot  the  ROI  for  the  payed  channels.  Email  is  number  one,  as  expected,  due  to  

the  small  cost  of  promotion.  ROI  and  Eng  for  Data  1.  **    

Heat  maps  Heat  maps  are  a  good  visualization  method  for  data  exploration  and  causality  explanation.  In  this   case  we   use   it   to   explore   conversions   and   engagement   into   a   calendar   to   visually   spot  trends.  We  use   the  GGplot2   library   to  create  a  Calendar  heatmap  with  data   from  6  months.  We  plot  engagement,  visits  as  well  as  transactions  on  calendar  so  we  get  perspective  on  how  they  interact  viz-­‐a-­‐viz  timeline.  

In  this  case  it  is  interesting  to  note  that  Tuesdays  have  high  visits  days  but  Wednesday  has   been   the   day   when  most   transactions   occurs.   Visits   increases   towards   the   end   of   year  (shopping  season)  and  then  slows  down  towards  year  start.  Engagement  has  been  improving  over  time.    

Page 12: Optimization of digital marketing campaigns

   

 Figure  11:  Heatmap  calendar  for  visits  (top)  and  revenue  (bottom)  over  the  last  6  months.    

 

4 Supervised  Learning  for  Revenue  Prediction  In  previous  sections  we  explored   the  data  patterns  without  concerns  about  causality  between  observations  (non-­‐supervised  learning).  In  this  section  we  go  a  step  forward  and  use  supervised  learning  to  make  predictions  on  data  based  on  past  records.  This  is  very  important  as  it  provides  explanation,  “the  why”  instead  of  “the  what”  as  we  enter  the  field  of  predictive  analytics.    

First  we  consider  the  problem  from  a  broader  perspective:  can  we  predict  the  revenue   from   a   certain   channel   by   looking   at   the   data   traffic   generated?   If   so,  with  how  much  accuracy  and  confidence?  What  is  the  difference  in  behaviour  from  a  user  

Page 13: Optimization of digital marketing campaigns

that   finalizes   a   purchase   from   other   users?   To   answer   these   questions   we   run  supervised  algorithms  trained  with  past  data  and  perform  classification  analysis.    

First   step,   we   enrich   our   data   extracting   extra   metrics   drill   down   by   5  dimensions   (time,   traffic   source,  adwords  ad  group,  operating   system,  and  city).   The  metrics   used   are:   number   of   visits,   average   pages   per   visit,   average   visit   duration,  bounce   rate,   visit   depth,   CTR,   page   load   time,   social   interaction   and   cost   of   ads   on  Adwords   and   Facebook.   From   these  metrics   we   extract   the   additional   performance  ratios  described  in  Section  2.2.  In  which  concerns  the  traffic  sources,  we  selected  only  the  top  10  performers.  We  consider  a  conversion  when  at  least  one  sale  is  concluded.  All  data  is  aggregated  with  a  daily  granularity.    

  We   run   the  algorithms  as  a   classification   task,   trying   to  predict  when  a  given  visit   leads   to  a   conversion   in  a  given   session.  The  data   set   contains  5680  sessions  of  which  432  have   conversions.  We  used  Support  Vector  Machines  and  Random  Forest  algorithm   since   they   can   easily   deal   with   categorical   and   continuous   inputs,   can   be  trained  with  very  few  examples,  and  does  not  overfit.      

  Since  many  more   visit   lead   to   non-­‐conversions   than   conversion,  we   create   a  balanced  data  set  by  randomly  eliminating  entries  that  don’t  lead  to  conversions.  We  end   up  with   864   training   examples.   All   data  was   normalized   and   the   algorithm  was  tested  using  10-­‐fold  cross  validation.  

In  Figure  13  we  plot  the  ROC  curve  obtained  over  a  period  of  165  days.  The  AUC  obtained  was  0.84.  For  comparison,  we  used  SVM  and  the  AUC  =  **.  This  is  somehow  surprising   result   given   the   small   set   of   inputs.     In   order   to   separate   the   traffic   from  Adwords,   we   run   the   algorithm   without   traffic   from   this   source.   The   results   have  improved  slightly.    

Random  forest  returns  several  measures  of  variable  importance.  The  most  reliable  measure  is  based  on  the  decrease  of  classification  accuracy  when  values  of  a  variable  in  a  node  of  a  tree  are  permuted  randomly,  and  this  is  the  measure  of  variable  importance.    

 Table  3  presents  the  best  discriminating  indicators  in  predicting  conversions:  traffic  origin  and  the  number  of  visits  –  see  also  Figure  12.  

 

Page 14: Optimization of digital marketing campaigns

 

Figure  12:  dispersion  of  inputs  for  data  set  2.  

 Figure  13:  ROC  curve  for  the  conversion  prediction  with  Random  Forest  and  SVM  algorithms.  FPR:  False  positive  rate,  TPR:  true  positive  rate.  

Table  3:  Best  performing  conversion  prediction  indicators  for  the  two  datasets.  

All  Variables   All  without  Adwords  

Traffic  Source   Number  of  visits  

Number  of  visits   Bounce  Rate  

Bounce  rate   Visit  Length  

Visit  length   Time  on  site  

 

Page 15: Optimization of digital marketing campaigns

5 Conclusions  In   this   work   we   have   used   a   set   of   machine   learning   techniques   for   data   exploration   and  predictive  analytics.  It  was  shown  that  exploratory  tools  can  help  understand  the  dynamics  of  digital  campaigns.    

  We  used  Random  Forest  algorithms  (a  collection  of  decision  trees)  and  SVM  to  predict  

the  conversions  with  a  reasonable  accuracy.  The  most  important  features  are  number  of  visits,  origin  of  traffic  and  visit  duration.  Surprisingly,  we  found  that  CTR  and  CR  have  little  influence  as  predictors  of  conversions.    

6 References    • 1.  Benjamin  Edelman,  Michael  Ostrovsky,  and  Michael  Schwarz:  "Internet  Advertising  

and   the   Generalized   Second-­‐Price   Auction:   Selling   Billions   of   Dollars   Worth   of  Keywords".  American  Economic  Review  97(1),  2007  pp  242-­‐259  

• 2.   P.   Maille,   E.   Markakis,   M.   Naldi,   G.   D.   Stamoulis,   B.   Tuffin.   Sponsored   Search  Auctions:   An   Overview   of   Research   with   Emphasis   on   Game   Theoretic   Aspects.   To  appear  in  the  Electronic  Commerce  Research  journal  (ECR).    

• 3.  Andrei  Broder,  Vanja  Josifovski.  Introduction  to  Computational  Advertising  Course,  Stanford  University,  California    

• 4.   Anand   Rajaraman   and   Jeffrey   D.   Ullman.   Mining   of   massive   datasets.   Cambridge  University  Press,  2012,  Chapter  8  –  Advertising  on  the  Web  

• 5.  James  Shanahan.  Digital  Advertising  and  Marketing:  A  review  of  three  generations.  Tutorial  on  WWW  2012  

• 7.  IAB’s  Internet  Advertising  Revenue  Report  http://www.iab.net/AdRevenueReport  • http://www.webanalyticsdemystified.com/downloads/Web_Analytics_Demystified_an

d_NextStage_Global_-­‐_Measuring_the_Immeasurable_-­‐_Visitor_Engagement.pdf