Photo by Casey Allen on Unsplash

A/B Testing — A Blessing or a Curse?

How being data-driven is killing high-tech innovation

I recently hosted my first webinar. The topic was “impact metrics”. I attempted to make a case for metrics that measure the value you are intending to provide your users. Things like reducing stress in your users lives, or giving them more free time. The punchline is, these mostly aren’t quantitative, instrumentable metrics. You have to do the (dreaded) qualitative research to find out how well you are doing.

My hypothesis is that because most people in high-tech are engineers at heart, with strongly developed logic and rationality, we are drawn to things that are quantitative. Qualitative feels squishy and hard. The truth is that both are hard. But quant metrics look “right” because they are numbers.

As you gain experience, you find more things to measure and growing complexity between the various metrics. At Bing, there were at least 100 metrics we could measure for every A/B test. Each test we ran had some negative and some positive impact and we spent hours analyzing results to determine if we wanted to ship a given experiment.

And in the land of search ads, you do have to do this. I wrote about this in a recent story “Effectively Using A/B testing”. But in so many other arenas, I think A/B testing is a curse, not a blessing.

I talk to PMs around the world every week as a mentor on Plato. And there is a theme coming through in these conversations (qual!) that is echoed by the various PM surveys I have seen (quant!).

PMs are frustrated by the lack of focus on the future, and that they don’t have a vision, strategy, or roadmap that goes beyond increasing marketshare, usage, or revenue. And when I dive into why this is what they are experiencing, A/B testing invariably comes up.

Wait, you are saying don’t A/B test?

For the sake of argument, let’s say I am saying that 🤯. Take a minute to imagine your life without it. Your current life probably looks like this:

  • Experiment review meeting. Group gathers to look at active experiments and talk about which to take to 100%. Someone argues wrt one that isn’t looking positive that the cohort needs more time to get used to it. Someone else points out that an experiment needs to be turned up to get enough data to make a call… And so on.
  • OKRs/KPI progress review meeting. Group gathers to review how well you are accomplishing your goal of increasing x metric by y percent in z timeframe. You’re falling behind and you talk about other experiments to run. Or you are ahead and you wonder if you should have set the target higher.
  • Feature team meeting. You argue with your designer about some design. He wants it to be more aesthetic and you think the button isn’t going to be noticeable enough. The engineers look bored and tell you to let them know when you decide.

Not the most fun days IMHO.

OK, time to own up. I have lived a good portion of my life as a PM without A/B. I shipped at least six 1.0 products in my career, where A/B (there is no A!) isn’t really an option. And I’m also pretty old, which means I started PM’ing well before being data-driven became a thing. I dealt with long release cycles and software that sold on retail shelves…

Here is what my life looked like:

  • Focus groups and customer panels. Office for Mac had a customer council that we gathered quarterly to run ideas by and hear about their pain. When I was delivering a B2B2C TV platform we met frequently with customers like AT&T and Comcast, visiting their data centers and talking to operators. We also did user research with the end users that would be using the software and advocated for their needs with the operators that were paying for the software.
  • Data immersion. Office had a research team that would spend time with users (end users, SMBs, enterprise), customers (those in charge of buying decisions), and doing competitive research. They would prepare presentations for the engineering teams to help us build empathy.
  • Survey panel. In Office, if we had a burning question for which we wanted an answer, we had a group of 1000 users we could ask questions of on a monthly basis. Stuff like “do you have this problem” or “rank these features” or “would you be upset if we took this feature away” or “what competitive products are you using”.
  • Brainstorming sessions. We would brainstorm regularly on the impact we’d like to have. We had code weeks where engineers could dream something up and show it to everyone. We had product review boards that people could pitch their ideas to. And no one had to estimate how much revenue would be generated or usage increased — only how much impact the idea would have on our users’ lives.
  • Design reviews. We would spend a lot of time making sure we had a design that was solid before coding. Designers and PMs attended these meetings. No one could say “let’s just test it and see” (which is designer for “I’m right and I’m tired of arguing”).
  • Prioritization discussions using customer-focused criteria. There was always more work to do than people or time. But no one ever talked about impact to revenue or usage. Office was at the forefront of productivity. The conversation was about staying at the forefront, not trying to stay alive. Mediaroom was the very first platform of it’s kind so our customers were all begging for features that we had to prioritize. We talked about importance and broke down importance into objectives. Then we measured features against those objectives.
  • Vision, strategy and roadmap discussions. The leadership teams I was on spent time talking about vision, strategy, and roadmap. Where was the industry headed? Does our strategy need updating? Where do we want to be in 2–5 years? These conversations evolved our thinking more so than they produced things that were “right”.

Overall, it was really, really fun. All of it. All of these conversations we had gave us a ton of shared context, respect for the work everyone was doing, and inspired us to innovate everywhere — features, process, data.

Innovation requires inspiration

This is the crux of the problem. Talking about revenue and activations is not as inspiring as talking about helping people communicate and collaborate (office), helping people relax and be entertained (mediaroom), helping people navigate and explore (apple maps).

Why do we insist on skipping over the impact we want to have and go straight to the trailing indicators of that impact?

I think it’s because we are scared not to.

It feels scary as hell not to A/B test. I get it. What if you ship something that has a huge negative impact?

So no, I am not saying you don’t A/B test. If you have the capability, you should definitely use it as final validation. If you don’t have the capability, you should get it.

But first be so convinced that your feature is a great idea that when the results come in and they are negative, your first instinct is to figure out what’s wrong with the test. If you’ve been doing this for any significant amount of time, you have a story about this. Something wasn’t instrumented correctly, some assumption about how metrics interact was incorrect. The test ended up going to the wrong users… something. This is still code and code is never perfect.

You will still miss every now and again. That’s a great use of A/B — a safety net.

How do you get to this point? User empathy. Vision. Strategy. Impact goals. Talk about these things as often as you can. If you are in a world where there is possibility of developing real empathy for your users and you achieve it, you will find that the ideas for how to help them come fast and furious and they will be great. You will find that agreement on those ideas becomes easier the more empathy the organization has as a whole. You will find that everyone works hard to deliver because they are so excited for their users to experience these new features.

And isn’t that what we all want work to be like? I certainly do.

We are the only industry that uses A/B to innovate

I also want to point out the somewhat obvious fact that innovators outside of high tech don’t have A/B to rely on. Maybe they do market research or prototype testing. But ultimately, pretty much no other industry gets to ship their innovations to a small percent of their users and see what happens.

If you are creating a chair or a new energy drink or writing a book or creating art… none of these people get to A/B test. And yet, they all innovate.

Summary Thoughts

I know I probably sound entitled. When I shipped 1.0’s it was mostly under cover of a large company like Apple or Microsoft. I realize it is a luxury not to have to worry about revenue to keep your company alive or MAU’s to keep your investors happy.

But don’t forget what got you to the 1.0 or what got you the funding. That inspiration doesn’t have to go away, completely replaced by fixating on your metrics and output of your A/B tests.

Your team wants to talk about impact. They want to be inspired to innovate on behalf of users they have come to care about. They want meaning in their work.

Just because we can, doesn’t mean we should.

For more on this topic, check out my webinar:

Good luck and happy innovating!

--

--

Product management leader (Apple, Microsoft) | Mentor | Lifelong Learner