A basic guide to SEO A/B testing
LondonSEO XL speaker guest post: Gus Pelogia
This blog post is a summary of my talk at LondonSEO XL in June 2024.
Back in December 2022, I was giving an end-of-year presentation to an R&D (Research and Development) higher-up at Indeed. Organic traffic and conversions were at 70% and 50% increase YoY for the specific product I was working on and needless to say that I’ve been quite proud of such results.
After making a long presentation and talking for a while, this leader looked at me and said: “how do we know these results are happening because of the things you did?”
That’s a simple and sensible question to ask but a hard one for SEOs to give a straight answer. I’ve been used to measuring SEO as a whole – all these things we did, together, are the cause of the positive results you’re seeing. However, R&D is used to measure specific things, building hypotheses and rolling out what is working, not “the whole”.
I was crushed after this meeting. I realised I had to prove actual impact to retain resources (like engineers) to keep working on SEO. It was disappointing and with the year almost over, I only had a chance to talk to her a few weeks later, in the new year.
Trying to find solutions in the interim, I found out the missing piece: SEO A/B testing. This changed my approach to SEO. In this post, I want to guide you through the steps I take to run A/B tests and how you can run your own.
What is SEO A/B testing?
An SEO A/B test splits pages that have been performing the same way (e.g. same daily organic traffic) into two groups: control and variant. Here’s a visual example:
An example of control vs variant groups before an intervention. Source: SEO A/B test Splitter
You can go back weeks, months or longer to find two equivalent groups. We know that we can’t use traditional A/B testing for SEO because Google would only see one version of the page, but this way, we’re comparing two sets of different pages that historically perform the same way.
After you have control and variant groups, you then have an intervention date, which is when your feature is released. The impact is the difference between the groups, and hopefully your variant (the group you made changes to is growing more than the control group).
Keep in mind that you’re comparing the groups, so if both grew (e.g. more traffic) the same amount, then your feature had no impact, despite traffic going up.
An example between control vs variant group traffic. The vertical dotted line is when the intervention (e.g. adding more internal links) happened. Can you see how the black line grew further? That’s the impact!
If we go back to my meeting when the R&D leader: “how do we know these results are happening because of the things you did?”, now I can give her exact numbers. I compared vs a control group and the pages that were part of this feature did grow more vs control.
How to run SEO A/B tests?
This is all done using a package, believe it or not, created by Google called Causal Impact. You simply install the package on a software called RStudio (which is free) and it’ll do the calculation for you.
Here’s an explanation in their own words:
Given a response time series (e.g., clicks) and a set of control time series (e.g., clicks in non-affected markets or clicks on other sites), the package constructs a Bayesian structural time series model. This model is then used to try and predict the counterfactual, i.e., how the response metric would have evolved after the intervention if the intervention had never occurred.
While I’m very lucky to have a team that can do this analysis for me, I was also excited when I read this article on Women in Tech SEO explaining how to do Causal Impact analysis. It’s exciting to be hands-on and I tend to send only the winners to the “official” team to double-check my work.
This is a great guide explaining step-by-step how to prepare your data to measure impact. While using RStudio might seem intimidating at first, let me tell you a secret: after you have the software installed, these are all the commands you need to place.
pre.period<-c(1,28)
post.period<-c(29,42)
impact<-CausalImpact::CausalImpact(Test_Example,pre.period,post.period)
plot(impact)
You’re just changing the row where the test starts and the name of your document.
This guide does not show how to find a control group, however it does explain how to use one. You can easily split your pages (before starting your intervention) using this Python tool I created using LLMs and Google Colab. All you need to do is import your spreadsheet with three headers: Date, URL, and Traffic. It should break the daily traffic a URL has for a given period (e.g. last 4 weeks).
Date | URL | Traffic |
12/08/2024 | domain.com/page-a | 30 |
13/08/2024 | domain.com/page-a | 25 |
12/08/2024 | domain.com/another-page | 67 |
13/08/2024 | domain.com/another-page | 53 |
Don’t want to go to Google Colab or look at code? Save a bit more time and use this free tool from Orange Valley I mentioned in my talk – It’ll do this breakdown for you without any hassle or registration, in seconds.
Real examples
While I can’t detail too much about each test, I can share with you the rationale behind them. They all tend to have between hundreds and thousands of pages, they include one domain at a time.
These are two examples I mentioned in my talk:
Test | Impact | Period |
Internal linking | +6% | 40 days |
Content gating | -8% traffic, +95% conversions | 90 days |
Beyond the testing tools, the foundation of a good A/B test is your hypothesis. Here’s an example of internal linking:
Select 30 pages with low traffic but high potential and add at least 50 internal links to each, using unique, anchor-text-rich links.
For a page to be included in the variant group, it must include an additional 50 internal links. In this case, the hypothesis is that the link volume is essential to success. Adding fewer links may not have an impact and would decrease the chance of success.
An example of cumulative impact after adding a high number of internal links
On content gating, the results show that organic traffic decreased over time but the conversions went up. Remember that a “conversion” is not just a sale but can be any action we want our visitors to take. Do you accept conversion increases at the cost of losing organic traffic? This will depend on your company’s goals but having the actual numbers will allow you to have awareness vs guesses when the conversion comes up.
More tips for SEO testing
I can’t stop testing these days. Everything I do must have control and variant groups, so I can prove impact – or lack it. Here are a few more times for good SEO A/B testing:
Isolate testing groups to avoid noise. If some pages are under control, don’t make any changes or artificially inflate their traffic (e.g. PPC ads) or they might grow more than naturally and decrease the measurable impact of your variant group.
Pages must have historical traffic because you need a baseline for comparison. If both groups start from zero, you can’t ensure they’re growing because of your changes and you’re way more susceptible to external elements such as indexing and search volume.
Not every test will work and that’s ok. You must have a confidence level of 95% or higher to trust the results. Lower numbers mean a lower confidence that the impact was due to the changes you made.
The same test can have different results in different groups of pages, domains or countries, depending on your maturity. If you already rank well, there’s less to gain. Do separate tests to find the “local” impact.