How to make a claim of causality

Posted onNovember 28, 2012 by Christopher Berry

Why are so many, so hesitant, to make a claim of causality?

This symbol, the honking red arrow, is the most powerful and important one in analytics.

The arrow represents a claim that one variable causes another.

To say that X causes Y involves judgement. Statisticians are quite right to say that correlation doesn’t prove causality. Statisticians have tests that attempt to rule out causality. But to assert that a relationship is causal requires judgement.

Statisticians still have a few issues to work out with Father Time and Mother Positivism. To a certain extent, digital analysts have inherited a few of these issues.

Is there a deficit of judgement in digital analytics?

Probably not. Leading digital analysts generate loads of framework, that, in their judgement, are great. And, we have hoards of people who say that they’re good frameworks too. So that must be so.

Why so many frameworks?

There are a lot of frameworks out there. A framework is an enumeration of metrics and then sorting them into themes or categories.

The bulk of the value-add is enumeration.

Some go one step further by combining some of these metrics, usually counts, and relabeling them. That’s good too.

Frameworks and conceptual bucketing has value. They get you published. And they’re tremendously popular.

And it’s safe. Nobody ever really argues about frameworks.

There’s nothing to really prove with a framework.

There’s nothing that can be done to really rally hard evidence that a given framework is inferior or superior to another way of looking at things.

It’s really tough to optimize a framework.

Something doesn’t make sense

So what’s the point of all of this material, of so many frameworks, of all of these counts of metrics, if no assertion of causality can be made?

Doesn’t making a decision require some expectation of cause and an effect?

Doesn’t the entire case for optimization assume that nature itself is known well enough that linkages between decisions in the present relate to outcomes in the future?

If that’s the core of digital analytics, shouldn’t we be seeing more causal models?

The beginning of a causal claim is a model. It’s a clear statement of what primarily causes what.

See the repost of a simple model below:

Primary decisions are on the left, the optimization objective to the right. Causal arrows flow from left to right. Causal arrows flow from one box to another box.

If I want more qualified leads, I need to spend more on my paid budget and my owned budget. Why? Because both budgets drive traffic to the website. Visits to the website are the primary cause of goal completions. And goal completions cause qualified leads.

How much of each? What ratio between the two is optimal?

If budgets are constrained (they are) and the optimal ratio is known (it generally isn’t), the next question is how to maximize the efficiency of the budget/traffic ratios, and, if we’re feeling frisky, the relationship between usability and goal completion. (That would require another model – something we haven’t talked about much beyond a programme of A/B testing.)

A model is an abstraction of reality. It doesn’t, and can’t, explain absolutely everything. The omission of almost everything is not a firm basis to reject a model. A model is good if it makes accurate predictions about the future. As such, it’s better than a gut or a statement of ‘no u’.

Any digital analyst reading this right now is capable of constructing a very simple sheet in Excel with six columns. They’re able to source a time series, perhaps with some effort, of all six variables. Any digital analyst knows the basic functions in Excel for deriving a type of Pearson correlation. Some digital analysts might want to get into R and apply some good old fashioned Granger tests to look for time lags. The point being, aside from enumerating variables, the relationship among variables can be quantified.

Does the model make good predictions about the future?

If so, there is enough reason to believe that the claims of causality in your model are useful. The model isn’t reality. It’s a useful representation, that, in your judgement, is good enough.

And, you can make decisions as though the relationships are causal.

It’s defendable because you have the facts that back it up.

You really can infer causality and drive decisions from it.

Claims of causality, once you know how to make them, are far more effective than frameworks. And, because they’re far more effective, you’re not going to see a lot of them out in the wild, with actual factors publicly shared.

It’s the right direction though.

***

I’m Christopher Berry
I tweet therefore, I am: @cjpberry