Hadoop’s Successors
Mike Miller (@mlmilleratmit) wrote a piece entitled “Why the days are numbered for Hadoop as we know it.”
The key paragraph is:
“In summary, Hadoop is an incredible tool for large-scale data processing on clusters of commodity hardware. But if you’re trying to process dynamic data sets, ad-hoc analytics or graph data structures, Google’s own actions clearly demonstrate better alternatives to the MapReduce paradigm. Percolator, Dremel and Pregel make an impressive trio and comprise the new canon of big data. I would be shocked if they don’t have a similar impact on IT as Google’s original big three of GFS, GMR, and BigTable have had.”
To simplify, in my words:
- The software package known as Hadoop is incredible for processing data that is too big to be stored on a single computer.
- You will be frustrated with Hadoop if you’re trying to explore that data for relationships or to run statistical queries really quickly, especially on data is growing really quickly.
- That dissatisfaction has caused Google to explore alternatives to the approach they’ve used in the past, which was Hadoop based.
- Three new tools, Percolator, Dremel, and Pregel are solutions to that problem.
- He predicts that those three tools are going to have as big of an impact as previous disruptive technologies from Google has had.
Michael E. Driscoll replied:
“I would encourage you to keep an eye on Metamarkets’ Druid, which Curt Monash recently covered: http://www.dbms2.com/2012/06/16/metamarkets-druid-overview/
…
In the coming decade, the disruption to the BI market will be driven by those who deliver solutions, not tools. And those solutions won’t be delivered by a big box or confined to a Windows desktop; they will be cloud-backed, web-delivered services. (Disclosure: I am the CEO at Metamarkets, so I admittedly have a dog in this hunt).”
What’s next
This exchange is important. Driscoll is CEO of an applied data science company. He’s credible on the subject. So is Miller. He’s chief scientist at Cloudant, a big data SaaS utility.
I believe that most people who use websites expect data to load quickly, (under 7.5 seconds), otherwise, they’ll click away. That’s the issue. It goes directly to the analytical workflow. In the future, many more people will rely on analytical workflows.
Solve the problem of rapid workflows on large, rapidly growing, data sets, and win.
***
I’m Christopher Berry.
Follow me @cjpberry
I blog at christopherberry.ca