This is a pretty good summary of the definition of data science. Some statisticians seem to be incensed. Some people say that this whole thing is invented as an O’Reilly buzzword. And there’s consternation, fear probably, over the devaluation of actual craft.

Sound familiar? Ah, the great Web Analytics debate of 2007. Yes. We’ve seen this.

Nothing like a fresh Gartner Hype Cycle in the morning, is there?

But lets consider what technology is causing, and the role that data scientist will play, in driving that cause.

Accessibility to data is expanding. What used to be the jealously guarded by people who didn’t want to be educators, is now liberally spread. It doesn’t really matter that most people don’t know what the figures means, does it? At best, it’s making big parts of the world smarter. At worst, it’s merely reinforcing pre-existing ignorance about what people conveniently want to believe.

Pop-business literature is good. More people are aware of the potential. It’s awesome.

And there’s no resisting this market. Everybody wants data because they believe it will make them better. That it will make the smarter. That it will result in sustainable competitive advantage.

And it can. Data is preresquisite. Understanding, well, that’s something else.

Data scientists will use the very best of computer science (computability, algorithms, scalability), the very best of usability (IA, UX, Infometrics) and the very best of statistical analysis (models, probability, learning). How they do so, and what are the best patterns for success, will fuel an entire generation of operations research. A sort of meta-meta study of the meta-meta of competing on analytics.

Too much meta?

Not enough. Not nearly enough.

While the fight for labels will begin, and then persist, for the better part of this decade – the proof will be in the experiences, and, one degree of causality out, the results.

***

I’m Christopher Berry.
I tweet about analytics @cjpberry
I write at christopherberry.ca