Causality, Certainty, Bias, Truth, and Prediction

Posted onApril 6, 2014 by Christopher Berry

There are varying concerns about what constitutes a causal model, the degree to which data is biased, certainty that the model is predictive about the future, and, that the model itself is a truthful depiction of nature. Over the course of the past two weeks I’ve talked with many people about their perspectives – data scientist, developers, technologies, product managers, brand managers, statisticians, consultants, professors, executive producers, and founders.

We’ve talked about everything from why analysts and their customers won’t accept narrow models, why it’s far easier to summarize data than it is to describe the relationships in it, and the intractable differences between what is performance reporting and what constitutes an insight.

The verdict is not in.

There are varying beliefs about the predictive power of statistical models. These range from absolute to skepticism to strong optimism.

On this point, Tukey, pictured below, has something to say.

Tukey wrote:

“A scientist’s actions are guided, not determined, by what has been derived from theory or established by experiment, as is his advice to others. The judgement with which isolated results are put together to guide action or advice in the usual situation, which is too complex for guidance to be deduced from available knowledge, will often be a mixture of individual and collective judgments, but judgment will play a crucial role. Scientist know that they will sometimes be wrong; they try not to err too often, but they attempt some insecurity as the price of wider scope. Data analysts must do the same.” (Tukey, 1962).

Models

Only in physics, and possibly chemistry, does a model become so perfectly predictive about nature that it collapses into becoming a theory. It’s why we can predict the effect of gravity to a dozen decimal points.

In complex systems, those caused by life and especially those created by sentience, models very rarely make precise predictions. It’s why we have a hard time predicting pageviews to within a few percentage points.

Modelling requires intuition about why nature is the way that it is. It requires judgement about what is likely true and what is likely not. That’s what it’s all about.

Again, Tukey has something to say:

“Data analysis, and the parts of statistics which adhere to it, must then take on the characteristics of a science rather than those of mathematics, specifically:

(b1) Data analysis must seek for scope and usefulness rather than security.

(b2) Data analysis must be willing to err moderately often in order that inadequate evidence shall more often suggest the right answer.

(b3) Data analysis must use mathematical argument and mathematical results as bases for judgment rather than as bases for proof or stamps of validity. (Tukey, 1962, 6).

Judgement

There are tremendous collections of descriptive statistics. There are comparatively far fewer discussions of comparing models. What’s embedded in those models are understandings about nature.

Progress comes from better judgement, which comes from a better understanding of the underlining phenomena that the data represents. The data sometimes goes to great extents to hide the truth.

Perhaps the verdict will ultimately derive from who has the best judgement, informed by those that understand how to wield statistical methods to develop more robust predictive models.