# Getting Started with A/B Testing

The science behind a simple but not-so-easy practice

As part of the series of posts to review the Conversion Rate Optimization course at CXL, we have reached the exciting phase of testing and all that goes with it.

Every marketer worth his salt talks about the need to do testing, or more specifically A/B testing of your marketing ideas. What is often missed or intentionally left out in conversations is the science behind this practice, which on the face of it looks pretty simple and easy. The concepts are simple, but it is not that easy. It takes a lot of rigour, discipline, a sharp eye for detail and persistence to master this skill.

# Introduction to A/B Testing

The thing with A/B testing is that it is very easy to get it wrong and come up with wrong conclusions if you do not follow the right process. Let’s get to the basics first.

A/B Testing is a validation method of your optimization program and so while you can do your optimization without A/B Testing, the missing will be the validation of those efforts in terms of cold metrics. Makes a valid case to include it in your optimization program, right? The idea that successful businesses are run on intuition and gut feeling and not data and metrics is quite outdated.

As a business, It makes very little sense not to do testing and it carries a lot more risks than conducting unsuccessful tests as change is inevitable and the more prepared you are, the more chances of your business surviving and thriving. To cut it short, you need to do organised and structured A/B testing if you are a serious business.

# What to Test?

Testing is very expensive and unproductive if not done the right way. There are all kinds of costs associated with it — resources, tools, taking time away from other tasks which are potentially critical and so on. So how do you decide what needs to be tested?

The best way is to look at it as an exercise to find solutions to business problems which can be fully measured and demonstrated.

Digital Analytics is a good place to start identifying the problems. The numbers in your analytics will give you signals on where the potential problems are — are users spending enough time on your site to be communicated to, is there a leak in your checkout process, are there specific segments in your audience that are more prospective and needs to be looked at more closely. The list goes on.

Next comes the qualitative part where you need to analyze issues from a more heuristic point of view than just mere numbers. Things like lack of clarity, the tone of your language, the unconvincing value proposition or elements that are possibly creating friction in the users to do the intended action. All these things are valuable inputs into analyzing the problems that need solution.

In essence, this research phase takes the majority of the time in your whole A/B exercise. It is almost 80/20, with the experimentation taking a lesser proportion of the time.

Once you identify the problems and the possible solutions, it is time to formulate them as hypotheses that need to be validated. The list of hypotheses should be discussed within a focus group to identify the priority of each experiment.

# Test Prioritization

There are many frameworks available to prioritize the testing. Here are a few of them.

**PIE:** This looked at the **Potential** (how likely the test will win), **Importance **(is this in an area that will have business impact) and **Ease** (how easy or difficult is it to implement this). All 3 are rated on a scale of 10 and the winner takes priority.

**ICE: **Quite similar to the PIE model, here the factors are **Impact,** **Cost** and **Effort **and they are rated on 2 scales — High (0 to 2) or Low (0–1) to determine the priority.

**5 Star Rating: **In this model, the hypotheses are rated on a scale of 5 stars based on the volume of the users affected and the severity of the problem you’re trying to solve.

All of the above involve a great deal of subjectivity and it is difficult to arrive at a consensus if you’re part of the core tasked with prioritization.

**PXL:** CXL has defined this custom model that looks at a whole lot more factors to decide the priority. So specific things like where the change is, what effect it will have on the motivation of the user, is it based on the data that was gathered during the research phase and so on. Due to the objectivity of the method and the increased detail with which it analyzes the factors makes it a very effective framework to follow.

**A/B Testing Statistics: **It is important to base your testing on proven statistics and by this you don’t need to be a professional statistician or mathematical genius. Just a bare minimum foundation of simple statistics will suffice.

So what are these statistical factors?

1. Sample size — Does your site have enough traffic or sample size to test a hypothesis with any degree of confidence? You can find that out using a sample size calculator like this one from CXL.

AB Test Sample Size Calculators — CXL

The calculator will tell you the sample size you need (traffic and conversions) to conduct an effective test and identify the MDE (or Minimum Detectable Effect which is the measure of how much your test variant should perform better than your existing or control version) as well as the duration to run the test.

Also important is to know when the test can be marked as completed. Though there is no magical number, 4 weeks is a good duration to run and you need to ensure that it covers multiple business cycles (in terms of seasons, periods or days) to make it a representative sample.

2. Statistical Significance — This is a measure which tells you when you can consider your test to be statistically valid. Again there is no silver bullet here, but a largely accepted value is a Statistical Significance of 95%. This can also be represented as a p-value of 0.05, the converse of statistical significance.

This is correlated with your sample size and the higher your sample size of your tests are, the bigger the statistical significance becomes.

3. Statistical Power — This is the 3rd part of the holy trinity of A/B testing statistics and it tells you the tolerance limit of your tests in terms of outcomes. It is key to understand a couple of terms in this context,

**False Positive** or type1 error is the case where you come to the conclusion that the variant that you tested is better than the existing version, but in reality it is not.

**False Negative or **type2 error on the other hand is when you conclude that the variant is is losing compared to the existing version, but in reality it is winning.

So the Statistical Power of 80% means that you are willing to accept 20% risk of having False Negative outcomes.

# Testing Strategies

Finally let’s look at some of the testing strategies that you should consider.

*What kind of tests?*

It is always good to start easy wins or low hanging fruits as they say. These are obvious problems with obvious solutions that need to be tested of course.

Next, you could look at creative ideas or persuasion tactics that will needs validation. Things like better value proposition, added social proof and the like.

The last category of tests you could do are big and massive changes like a complete redesign, a fresh messaging or introducing features that offer a totally different experience. These are Innovative tests and should be carefully considered as the results could go either way and in a big way.

*Where to test?*

A good rule of thumb is to start close to the money. So if it is an e-commerce site, look at the checkout process and work your way backwards. The advantage here is that the conversion rate will be normally higher on these pages and so the sample size will be better.

*How many to test?*

The count depends on the traffic to your site and unless you have a heavily trafficked site, you should test multiple changes to create the required MDE (Minimum Detectable Effect). A good advice here would be to test a single hypothesis though that may involve multiple changes. E.g social proof or trust factor.

*A/B Testing and MVT (Multivariate Testing)*

The difference here is that in MVT, you’re testing the correlation or the interaction between the different elements like your headline, body, form, buttons and so on. Here also, the traffic to the site is a key factor in deciding what to go for, though you should always start with A/B Testing and when your site reaches a level where there is enough traffic and more, you could do MVT.

To conclude, what should you do in case your tests lose? You should look to validate the data that you’re basing your tests on. Were there errors or technical issues on the page? Were you testing on traffic that is not representative (tested on social traffic and released on search traffic for instance)? Or were there historical effects like your competition outsmarting you on a price or offer tactic? Keep digging your data to a reasonable extent before you wind up.

The instructor Peep Laja is in his true elements in the lectures and the way he delivers the lectures, makes it look all so effortless and simple. Loved it. Can’t have enough of testing yet.

Catch you all soon. Ciao!