October 11, 2018

A/B Testing: The Definitive Guide to Improving Your Product

Go from "I think" to "I know how much."

Whether you are in product, design or, growth, you probably come across the following questions:

  • How could we increase our conversion rate?
  • How can we improve our AARRR funnel?
  • How successful has this feature launch been?
  • Have these notifications increased retention?

You can get answers to those questions by correctly designing, running, and interpreting A/B tests

All you need to get started on your testing journey is:

  • Basic statistics understanding
  • Accurate and sufficient (>1000 monthly unique users) product data

The approach taken is rigorous and encourages best practices but:

  • It will not demonstrate the maths behind frequentist hypothesis testing.
  • It will not cover more advanced aspects such as A/B in a Bayesian context, multivariate-testing or Bonferroni correction.

Ready? Let's get started!

0. Understand that split testing is a quantitative way to improve your product and that it's easy to get it wrong

At its core, A/B testing is a way to compare two versions of a flow in your user journey in order to determine which one performs best.

ab-testing

To be able to get insights from your tests you will need enough people in each group (called control and experiment group, respectively) and enough of a conversion difference. Both values should be defined before running your test, as you're about to see in point 3.

1. Select an important conversion step from your product's funnel

Technically you can test anything, like changing the font on your terms and conditions page. However, you always want to pick the steps with the highest impact on your business outcome. Also, keep in mind the importance of data quantity — the higher the traffic the faster the conclusions — and of data quality — garbage in, garbage out.

Bad data, bad conclusions. Few data, slow conclusions.

Let's get practical and do it for Trana, the example we'll use for this post.

important-funnel-step

Getting Started - Connecting Strava - Onboarding

Trana is a Facebook Messenger ChatApp that provides extra running insights to its users. To do so it connects to their Strava account. Therefore, the moment at which we ask users to link their account is key. On top of this instinctive rationale, let's analyze our full onboarding funnel. This can be done in a tool such as Mixpanel after having designed a tracking plan.

full-funnel-conversion

Funnel analysis in Mixpanel

We can see that linking the account, going from Onboarding Started to Strava Connected, is the most impactful action both in absolute and relative terms. Got your own important step? Let's continue!

2. Analyse the current Click-Through-Probability from this step to a desired action

The Click-through-probability (CTP) is the probability a given user clicks to get to the next step.

Its value is always between 0 and 1. It's therefore not the same as the click-through-rate (CTR) which is the #clicks/#impression.

The CTP needs to be from the chosen step to a crucial action, not necessarily to the very next step, because you might otherwise end up at a local maximum. Here's a simple example to understand this: let's say that you now automatically redirect people from the home page to the pricing/premium page of your product. Your home to pricing conversion will be through the roof but you might want to assess whether people are in turn more likely to become premium users.

In short: optimize with the end, not the very next step, in mind.

key-step-click-through-probability

8

We have seen from the full funnel that the CTP is 8

3. Construct a hypothesis to kickstart your experiment

AB testing is nothing but methodologically assess whether a variation, called experiment, performs better than the current situation, called control. An experiment starts by constructing a hypothesis. It can arise from gut-feeling, customers feedback or team discussions.

experiment-hypothesis

The hypothesis is the following: By seeing how Trana can help them run smarter, users will be less reluctant to connect their Strava account

4. Determine the sample size

This is the heart of A/B testing and the most jargon-heavy part of this guide. I'll be happy to answer

The test will be run until both samples have a least 1030 observations. From the historic traffic you can assess how long that'll be. Try to keep experiments running less than one month. The faster they run, the faster your product improves.

Compute your sample size, write it down and let the testing begin. Once you have sufficient data you can analyze the results.

5. Get valid insights from your results

5.1 Collect the data.

Wait until both variations have sufficient sample size and proceed to the data collection.

control-experiment-results

Breakdown of CTP per version

The experiment conversion is better. It's a first good sign but we can't stop there. So let's proceed to a few checks.

5.2 Make sure your sampling worked as intended.

First, check that the sampling was not biased towards one of the groups. Here is the computation for our example:

sanity-check-invariants

5.3 Check if the experiment performed significantly better.

Assessing whether the experiment is outperforming the control can be a hassle. Therefore, I built a little tool to do it, plug your figures into it to perform the three checks. Below are the details of the computation. dmin is the minimum detectable effect, N is the sample size, and X the conversion count, the number of people who took the desired action.

ab-testing-results

Statistical but no practical significance

Where d̂ ± m is the confidence interval.

Here is how to interpret results:

  • d̂ - m > d𝑚𝑖𝑛: The experiment is outperforming the control (statistical and practical significance). Change flow to new version.
  • d̂ - m > 0: The experiment is outperforming the control (statistical significance only). This is the case in our example. To get practical significance you can let the test run longer. You can also make a judgement call and go with your gut feeling.
  • d̂ - m < 0: You can't say that the experiment is outperforming the control. Keep the original version and generate new hypotheses.

6. Communicate your results.

Whatever the outcome, always communicate and document.

I recommend that you signal when the test starts and when it ends, using these key moments as an opportunity for stakeholders to voice their ideas and be up-to-date with the latest changes.

Also, always document and archive. You want the testing process to outlive you so that anyone taking over has a library of experiments to learn from.

Here is a Google Slides template your can use to communicate and document your results.

7. Bring it all together.

Here is a recap of the steps you ought to take to go from "I think" to "I know how much" :

  1. Select an important conversion step from your product’s funnel
  2. Find out the current Click-Through-Probability (CTP) from this step to a desired action
  3. Construct a hypothesis
  4. Determine the sample size
  5. Gather and analyse results
  6. Communicate and document your results

This guide was made possible thanks to these amazing resources:

Here's how you can go further:

Editor's note: This was originally posted on Robin Rampaer's blog, and has been reposted with perlesson.

Robin Rampaer

About the author

Robin Rampaer

Robin is a business engineer with a passion for data-powered problem solving. Intrapreneur during the week, entrepreneur on the weekends, he co-created Pengo to make payments contextual & social

Learn data skills for free

Headshot Headshot

Join 1M+ learners

Try free courses