Debunking Myth #1
AKA putting the hype in hypothesis
The very first A/B testing myth in last week's email was this: A/B testing is for validating hypotheses
What could be wrong with that? If you've ever read anything about product design or optimisation, you've seen people saying that we must form hypotheses and validate them.
They're nearly right.
My beef is with the framing "validate" (OK, OK it's also with the limiting way that most organisations treat hypotheses, but we'll get to that later.)
What's the problem with validation?
When you look to validate your hypotheses, you’re looking to prove yourself right.
This is called motivated reasoning. It's how bad science works. It's how bad gamblers lose all their money. It's what we do when we're so invested in how we think the world should work that we stop seeing how it actually works. And the sad truth is that you can validate almost anything if you're desperate enough to prove yourself right. The nature of randomness means that any big enough pile of data will contain a pattern that supports what you want.
This is when people start reading the tea leaves in misleading metrics like clicks, "engagement" or time-on-site.
But the worst thing about validating is that it closes you off to the very real possibility that you're wrong about what's going to work. If what works is in any way surprising, you'll never find it, because you'll never include it in your experiments.
When you look to validate, the word “hypothesis” becomes a mere proxy for “my opinion that I think is true that I don’t want you to challenge.”
Good science is about the opposite. It's about trying to disprove your hypothesis. It's about looking to find out how you’re wrong.
So first, consider one mindshift. If A/B tests are about hypotheses, they're about INVALIDATING those hypotheses.
There's yet another mindshift though.
When we frame a hypothesis based on one “brilliant” idea, we close off a world of possibilities.
Here's a grossly oversimplified example:
Badly framed hypothesis: "if we make the button green, more people will click on it."
In this experiment, we'd have one version with the original button and one with the same button in green. And whatever happens, we've learned almost nothing. What next? Is blue better? Or orange? ...
Many testers adopt this approach, scattergunning tiny tweak after tiny tweak, like a blind monkey with a musket.
Contrast this with a well-framed hypothesis: "the design of the button doesn't make any difference to our revenue.”
Oh the difference with this framing! Now we're challenged to try a wide range of button designs at the same time, all different from one another, some we believe in, some challenging everything we believe.
And with that, we've massively increased our chances of finding a button design that increases sales.
Even better, if none of our buttons make any difference to the bottom line, we truly learn something. We can now be fairly confident in what we suspected all along: the design of the button probably isn't the most important thing to our customers. In a single test, we've freed up our valuable resources to concentrate on other changes.
Here's another example.
Bad: "testimonials will reduce our propects' anxiety and so they'll buy more."
Good: "showing prospects social proof on the product page won't make any difference to our revenue."
Do you see the difference?
Instead of looking to prove we’re right, we're approaching things with curiosity.
This is the second mindshift.
Instead of “tell me how great I am,” we’re saying, “I wonder what's going to happen if..."