Using Multivariate Testing to get to what matters
A lot of people I talk to about A/B Testing are confused about multivariate tests (MVTs).
There are two big misconceptions floating around. One is that it’s just testing lots of variations at the same time. The other is that it’s about testing lots of little changes at the same time to see which combination works best.
MVTs are much more subtle and powerful than that. They aren’t about testing variants: they’re about testing variables.
It’s pretty confusing because “variate” sounds more like “variant” than “variable”. Let’s illustrate the difference using an example.
Simple example: a button
(I’m gritting my teeth all through this example. I know everyone uses buttons as examples, but please promise me you won’t focus your testing on buttons — unless of course your MVTs have shown it’s worth testing them. We’ll get to that later…)
With a multivariate test (MVT), we’re testing variables against one another. We might test two very specific variables:
Size (smaller vs. bigger)
Colour (blue vs. not-blue — red in this case)
At the end of this MVT, we’ll find out whether size or colour had a bigger influence on the result. As a bonus, we’ll also find out which combination of variables happened to perform the best, but that wasn’t the point of the experiment.
Compare this to a multiple variant (A/B/C/D) test. Here we’re testing lots of variants – lots of expressions of a single variable. For instance, we might test lots of completely different button sizes:
At the end of the experiment we’re going to find out which button size performed best. In a good multiple variant test, we’ll have tested across the whole range of possible sizes, so we’ll have found one that’s close to optimum.
So we’ve used an MVT to identify that button size is a variable worth testing more, then we’ve tested multiple variants across a range of possible button sizes. Chances are pretty good that we’ve found a near-optimal size and colour for the button.
So two types of experiment work hand-in-hand:
MVTs: which experimental variables are valuable to test along?
Multiple variant tests: what’s the best possible expression of a variable?
So far, so good.
It gets more complicated in the real world
Let’s move to real world testing, where even a simple button MVT can have many more dimensions. Why not test along variables for colour saturation, shadow depth, shape, texture, typeface, font weight, … ?
This is where we can consider testing multiple variants of “how a button could be”, instead of limiting our variables:
And that’s before we’ve even considered what the words on the button are, or where the button is relative to other elements… Even for one button, the possibilities get overwhelming fast.
And any button is just one of hundreds of elements on a web site, one of thousands in a customer’s broader experience with your company. It’s unlikely that any randomly selected button is the most important place to start testing.
So where the heck are we going to start?
We can step back a level and test at the level of a page.
Note: this is where it’s easy to fall into the trap of crappy MVTs.
If you’ve run an MVT before, chances are it was something like this: “what if we change the headline, CTA, picture, and the proof? We’ll test a different option for all of those and find out which combination works best!”
NO. With that sort of MVT, we’re making the mistake we talked about in the last few weeks. We’re trying to validate our ideas. The only difference is that we’re doing it in several places at the same time, hoping that something will stick. That doesn’t make it any better.
What do we do instead?
MVTs at the scale of a page are slightly different
While we can test along the variable “what the headline is”, the differences between the headlines are qualitative – it’s not the same as a quantitative variable like button size.
If we test the headline “Get Started” against the headline, “Create Account”, we’re testing two expressions of “what the headline is”. But if we find out that the change didn’t make any difference, we still don’t know whether it’s because the headline isn’t an important variable or because we didn’t change the headline variable in a meaningful way.
What’s more, there’s an underlying problem with using MVTs this way. They’re built on the premise that each variable is independent of the others.
Button size might have the same effect on a customer’s behaviour regardless of other features of the button (shape, colour, shadow, font, etc.) but at the page level it’s different. Headline, image, button, and copy all have strong interplay:
Headline and hero image – not independent variables
Even when the interplay isn’t this obvious, we can’t assume independence. Messy humans interacting with janky websites… this is not a recipe for a nice, predictable machine.
I suspect the premise of independent variables holds true for far fewer variables than we might like to think, and renders MVTs less helpful than we’d like.
Inclusion/exclusion testing
However, there is a powerful way to use the MVT at the level of a page with something called an inclusion/exclusion test. Here, we test completely removing elements of the experience. What if the headline / photo / testimonial / Twitter-feed / carousel just wasn’t there at all?
(Note, the only thing you definitely have to keep on the page is one link that takes the customer to the next page.)
Through this type of experiment, we can find out which elements are doing their job (h/t Nick D) and which are just also there. This sounds crude but can deliver powerful results. Sometimes, merely taking away some confusing noise from a page boosts conversion.
But there’s a level beyond this. What if the page we’re testing isn’t the most important place to test?
MVT principles in broad beta multiple variant experiments
Now let’s scale up to the level of pages in a funnel. We’re going to treat “what this page could be like” as our qualitative variable. The eagle-eyed will note that this variable could suffer from the same problem as using “what the headline is” as an MVT variable: how do we know that we’re making a change that affects the variable?
That’s where broad beta testing comes in.
The point of a broad beta test is to explore as widely as we can across the whole expanse of a variable.
So instead of testing one new page design, we’ll test 5–10 variants of the page. We’ll make them all very different from one another: some stripped bare, some packed full, some we hope will win, and some we rather dislike.
This way, we reduce the chance that we’re not changing anything that matters. Now if we find out that none of the variants made any real difference to the bottom line, we do have a good indication that this particular page isn’t critical for our customers.
This way, every single broad beta test helps us update our knowledge about what’s important to our customers. (Remember: if we go for the standard “one challenger vs. control” test to validate our ideas, we don’t get this clarity.)
Ultimately, we’re looking for what works better. But using MVT principles to show us what really matters, we can get better at looking in the right places.
This was a meaty one to write. I’d love to hear what you think.
Cheers,
Tom