Why Analysts Are Picking Python

Posted onFebruary 18, 2013 by Christopher Berry

There are (at least) four reasons why analysts are picking python.

For a decade now, I’ve relied on Python for simulation and data ETL, and I’ve depended on SPSS or R for data analysis. The reason for the two-step (and sometimes three if we include excel) is that there were no good libraries that could really replace SPSS or R completely. Scipy and numpy are excellent for operating on well formed arrays of data, but are decisively less efficient, from a user perspective, at handling data.

Data frames, popularized by R, are finally available through Python through a package called PANDAS. And it’s a nice library. Scipy and numpy, two very popular libraries, are still out there in use too. PANDAS has data frames. So, naturally, analysts are going to go that way – it solves their handling problem.

The second reason is the availability of JSON data. A lot of queries can only be answered if you understand JSON and know how to parse out that material. This is an activity that can’t be done in Excel or SPSS. JSON support in R is a bit shocking at times. Python has far more libraries that handle far more issues with JSON.

The third reason is the increasing demand for predictive analytics. Prediction requires more powerful tools and better simulation facilities. Python offers those.

The fourth reason is that Python is the chosen workbench of many academics right now. Python is updated frequently and is really well supported. The ecosystem responds and evolves in ways that are simply faster than a major enterprise can or will. In other words, Python is current.

People pick languages not so much for the elegance of the syntax (look at javascript – horrendous), but for the utility and advantages that it brings to their bench. Python is bringing some serious competitive advantages to those who chose it and chose to use it.