How to Keep Your Gains from A/B Tests Without Accidentally Killing Them Later

Post on 20-Aug-2015

4.353 views 1 download

Transcript of How to Keep Your Gains from A/B Tests Without Accidentally Killing Them Later

Lars Lofgren and Will Kurt

Keep Your Gains from A/B Tests Without Killing Them Later May 2014

@larslofgren

Hit me up

1 The limits of A/B tests

We’ll cover…

2 The standard solutions

3 Simulations! Woohoo!

#KISSwebinar

4 The 3 strategies of A/B testing that work

5 How we A/B test at KISSmetrics

Limits of A/B Tests

A/B tests don’t give you perfect decisions.

No ma!er what you do, you’re never 100% certain

If we’re not careful, winners aren’t really winners

Your conversions go up… and then they come back down

The Standard Solution

Run your test until you hit 95% statistical significance.

Go to getdatadriven.com if you need a significance

calculator.

1 Pick the minimal improvement

Scientific A/B testing:

2 Determine your sample size

3 Determine degree of certainty (95%)

#KISSwebinar

4 Start test but don’t check it early

5 If results aren’t significant, keep control

Martin Goodson’s PDF on poor testing methods:

kiss.ly/bad-testing

This gives us the best data but not necessarily the

best ROI.

So how far do we take this?

Simulation Time!

We modeled several A/B testing strategies.

Using Monte Carlo simulations, we tested different strategies over 1

million observations (people).

Will Kurt gets full credit for all this.

@willkurt

1 Pick the minimal improvement

The Scientist:

2 Determine your sample size

3 Determine degree of certainty (95%)

#KISSwebinar

4 Start test but don’t check it early

5 If results aren’t significant, keep control

Results for the Scientist:

1 Waits until 80% significance

The Reckless Marketer:

#KISSwebinar

2 Calls a winner as soon as 80% gets hit

Results for the Reckless Marketer:

1 Waits for 95% significance

The Impatient Marketer:

#KISSwebinar

2 Moves on to the next test a#er 500 people

Results for the Impatient Marketer:

The Realist

#KISSwebinar

1 Waits for 99% significance

2 Moves on to the next test a#er 2,000 people

Results for the Realist:

The Persistent Realist

#KISSwebinar

1 Waits for 99% significance

2 Moves on to the next test a#er 20,000 people

Results for the Persistent Realist:

The Blitz Realist

#KISSwebinar

1 Waits for 99% significance

2 Moves on to the next test a#er 200 people

Results for the Blitz Realist:

Let’s compare them using the area under the curve.

A/B Strategy Scores

Strategy Conditions Score

Scientist Stats like a pro 67759

Reckless Marketer 80% 57649

Impatient Marketer 95% and 500 people 60532

Realist 99% and 2,000 people 67896

Persistent Realist 99% and 20,000 people 68346

Blitz Realist 99% and 200 people 62836

No Testing Testing? NOPE! 50000

Each score is the area under the curve from the simulation. The higher the score, the more

conversions you received.

0

17500

35000

52500

70000

Persistent Realist Realist Scientist Blitz Realist Impatient Reckless No Testing

50,000

57,64960,532

62,83667,75967,89668,346

A/B Strategy Scores

3 Strategies

Don’t make decisions at less than 95% significance.

You’ll waste all the time you spend testing

1 Be a scientist at 95%

We have 3 viable strategies for making this work:

2 Only make changes at 99%

3 Sloppy 95% but make it up in volume

#KISSwebinar

1 Pick the minimal improvement

Be a scientist when you have lots of data and resources

2 Determine your sample size

3 Determine degree of certainty (95%)

#KISSwebinar

4 Start test but don’t check it early

5 If results aren’t significant, keep control

If you don’t have the data or resources to be a

scientist, go fast at 99%.

And if you still want to play at 95% without being

a scientist, never stop testing.

How We A/B Test

First, get volume to 4000+ people/month.

Only make changes at 99% significance.

Let the test run at least 1 week before checking

results.

If not at 99% a#er two weeks, launch the next

test.

If the next test isn’t ready, let it keep running while you build the next one.

The KISSmetrics A/B Testing Strategy

1 Get to 4,000 people/month for test

2 Only change the control if you reach 99%

3 Check results a#er 1 week

4 Launch the next test at 2 weeks

5 Let old tests run if you’re still building

This strategy isn’t perfect. It’s a balance between good data and speed.

1 Be a scientist at 95%

Remember the 3 strategies:

2 Only make changes at 99%

3 Sloppy 95% but make it up in volume

#KISSwebinar

Q&A Time!Lars Lofgren @larslofgren

llofgren@kissmetrics.com