As teams go deeper into analyzing their product, they are bound to come across a question that might haunt their dreams: are we looking at causation vs correlation?
All teams will discover patterns in how their users are engaging with their products but they will be unsure if a user doing a certain action is causing something else or if they are merely correlated.
If you’re falling asleep, I don’t blame you. This likely feels like throwback back to college or university, however, I’ll do my best to break this down into simple terms and actions that your team can use right away.
Let’s start by understanding what these two words even mean!
Causation simply means that X was responsible for Y. Ice cream sales go up in the summer and we can safely say that hot weather causes this increase. On the other hand, we may have two actions that seem to have a relationship but aren’t necessarily caused by each other. These two actions are correlated but may lack causation.
We may see a decrease in ice cream sales around the start of the year due to new year resolutions. Causation is unclear but there seems to be a relationship. Perhaps people are tired of sweets from the holidays or the weather starts to turn cold around January.
You’ll also notice that the relationship could be positive (both variables increase or decrease together), negative (one variable increases while the other decreases) or there could be no relationship at all (increases or decreases are random).
For advanced product teams, it’s important to understand these two terms and how they might play out within their product. You may realize that users who sign up using their Facebook account have higher retention rates than those who simply use their email address. Or may notice a significant change to your north star metric.
This insight could then lead you to force every user to sign up via Facebook which could lead to plummeting retention rates over the long term. Assuming that something is the cause can be disastrous without double-checking your assumptions.
There are no easy answers when it comes to separating actions that seem to be correlated from those that are truly connected to each other (causation). For every hypothesis that you have, you’ll need to run it through a battery of tests to prove or disprove it.
Besides the ice cream example, let’s use a few other examples that companies are likely to run into.
Situation #1: Product redesign
Let’s imagine that your team has decided to redesign your product or website. After you launch the redesign, you notice a sharp increase in user engagement or traffic to your website. Is the redesign the cause of this increase?
They are likely correlated but causation will be hard to prove. Traffic increase could have come from a different source and even user engagement could be caused by another source like communication journeys.
This is also a great example of something that might be hard to prove because redesign doesn’t happen often. This means that it will be hard to duplicate the scenario in an A/B test but you could test the other elements such as new onboarding flows or new marketing campaigns.
Situation #2: New onboarding flow
In this example, your team is getting ready to release a new onboarding flow that should make it easier for users to start using your product. The new onboarding flow converts at a higher rate than the old one.
In this case, causation is likely to be found because we are looking at a very specific part of the product. Better yet, we can easily test this by A/B testing the new flow vs the old one on random groups of users.
Situation #3: New cultural values
Let’s now look at a trickier example. Let’s imagine that your company goes through a company retreat and ends up adopting new values for your organization. You then notice that all your core KPIs increase over the following 90 days.
Causation vs correlation will be hard to prove in this scenario because of how separated the actions are (new cultural values and KPIs increasing). However, over time, you could disqualify other factors and if the increases stick, you could at least attribute correlation.
The common themes through these scenarios and many others are as follows:
As mentioned in the previous section, there are 3 different ways to test for causation vs correlation in the real world. Let’s look at each one and where you would use them.
1. A/B Tests
The best option here is to run properly designed A/B tests. The keyword here is “properly”. The test should randomize who sees specific variations (or flows), you should get enough volume to reach statistical significance and you should run the tests for long enough to see short term business cycles.
It isn’t enough to just run the test and make a decision with whatever data you have. You will also need to run multiple tests confirming the same hypothesis. In the onboarding example above, this might include the following tests:
2. Further Analysis
A second option is to dive deeper into the data to prove or disprove a hypothesis. Assuming you have the data, you would be focused on finding the correct user segments and behaviors.
Let’s imagine that you’re trying to understand how a specific feature is affecting your overall retention rate. Using your existing data, you would the following analysis:
3. Ignoring It
The third option is to ignore it altogether. This may sound crazy (or lazy) but there are situations that aren’t worth the effort to analyze them. A perfect example is a product redesign. This is something that doesn’t have very often and isn’t easy to test using A/B tests. You could use “Further Analysis” but the question becomes: what’s the point?
If you do realize that the redesign helped, are you going to run another redesign? Unlikely. If you realize that the redesign didn’t help, are you going to roll back? Maybe, depending on how much a team likes a new redesign and the impact.
This isn’t to say that you shouldn’t analyze the performance of a redesign but that shouldn’t worry too much about the causation vs correlation question.
Regardless of which option you choose, they take time and resources. This is like trying to put together a puzzle. Every test or analysis will add one more piece to the overall design but it will take multiple pieces before you can see the overall picture.
Now that we went through all the serious stuff, let’s take a moment and look at the humor behind this. If you look hard enough, you can find a correlation almost anywhere.
To prove this, I spent a few minutes doing this for public data sets. I used the hilarious site called by Tyler Vigen where has created a tool to easily correlate a wide array of variables.
Take all the charts below with a grain salt and enjoy the silliness.
The first example is people who literally worked themselves to death and the sales of General Mills cereals.
It seems that the more you eat cereal, the more likely you are to work yourself to death.
Next, we have the divorce rates in Alabama and the per capita consumption of whole milk.
Lucky for us, it seems that as people drink less milk, the divorce rate goes down. Or perhaps the consumption of milk goes down as people get divorced less. Either way, if you want to avoid getting divorced in Alabam, drink less milk.
Finally, we have the per capita consumption of chicken and the number of lawyers in American Samoa.
Just like before, the fewer lawyers in American Samoa, the less the US consumes chicken. The chicken industry should clearly be investing billions of dollars into law schools in American Samoa.
Let’s come back to the serious stuff. Remember that correlation and causation are advanced topics when it comes to product analysis. In almost all examples, more data is always better and the better structured you are at running experiments, the more likely you are to find answers (either proving or disproving your hypothesis).