John Rauser is a data scientist at Amazon. He put forward a perfectly good definition of Big Data at a conference last week. He said that big data is defined as:

“Any amount of data that’s too big to be handled by one computer.”

That’s a great definition. I like that. Because if it can’t be handled by one computer, it belongs to a special class of problems that are caused when data is distributed and fragmented.

That’s great John, thank you.

Bring on the Anti-Hypers

It’s great to use terms like ‘bandwagon‘ and ‘data fetish‘. If you cut through the negativity and really strip it down, you’ll see there’s a concern that storage of data is not the same as the extraction of value from data.

They’re worried that the big vendors are going to go in, firms are going to store masses of useless data, managers will say ‘haha, I’m so smrt’, and things will get worse.

It’s almost as though they’ve seen this before. Like in the eighties and nineties. Like possibly….with Business Intelligence systems?

They’re going to over invest in storage and under invest in extraction.

I can’t say I blame them for being skeptical. There’s no reason to believe that it’s different this time. Unless we make it different this time around.


I’m Christopher Berry.
I tweet about analytics @cjpberry
I write at