How we understand data and how we explain reason
You may have recently clicked a link leading to this paper by Robert Ghrist on Barcodes. You may have also read a previous post about MINE. And finally, this month I talked about histograms and proceeded to subject you to their importance of seeing the data, again and again and again.
- Seeing the data helps analysts understand the data.
- Showing the data alone isn’t explaining the data.
The first question, in response to seeing a line on a chart, is “why”?
Sure, if the line is going up, I caused that. If it’s going down, that’s the weather’s fault. Fine. Those are great, convenient reasoning, guesses.
It’s much harder to assert that a relationship between two things really exists. At the very least, you can plot two lines on a chart and visually look for a pattern. I’m not sure if that’s really analysis. The next level is to run a statistical test to understand the goodness of fit. The next level up is to apply a lag factor to the time series and evaluate the fit. And this is where technology like MINE becomes extremely useful. The machine finds patterns.
It’s much harder, still, to prove that one thing causes another. Bias and correlation aside, we communicate causality using models, understand the reason for that past performance, and use that reasoning to make predictions about the future.
Advanced data visualization, like the ones imagined by Robert Ghrist, are awesome for helping the sensemakers make sense of the data. And I’m very happy to have discovered his work.
These advances are for us to use to create better models, faster.
They’re not as awesome for helping the sensemakers explain the data. They might be persuasive to analysts. They’re not necessarily persuasive to anybody else.
If you accept this position, that there is a difference between how somebody understands data and how somebody explains reason,
I have two questions:
- Why is it common to send out reports purely in Excel Format?
- Why is it common to present a chart, and give no reasons?
I know this is happening. Sometimes people ask for Excel files, because they have to put it into other formats. I get that. That’s an edge case in my practices. And I don’t like it. But I do it.
I know it’s the norm. I get your reports.
Here’s where I’m at:
New technology, like R, PANDAS, and MINE, are enabling us to see the data way better than we ever could before. Datameer and Tableau enable us to move around large amounts of data and understand it.
What’s going on the explanation side?
I’m Christopher Berry.
I tweet about analytics @cjpberry
I write at christopherberry.ca