Excel continues to be THE major tool in analytics.

It shouldn’t be.

Excel:

  • Does not scale beyond a single computer, and frequently fails to load with very large data sets
  • Does not contain separate model, controller, and viewer modules unless completely forced
  • Allows human beings to make too many big mistakes; too much error

On the other hand, Excel:

  • Is easy to use
  • Creates pretty charts that, with effort, can be dragged into PowerPoint presentations
  • Is shareable
  • Is cheap
  • Is fast (compared to building something accurate or scalable)

There are broader problems with Excel, namely:

  • They’re prone to complexity creep
  • Engenders disrespect (After all, it’s just pizza and spreadsheets, derp)
  • Are generally not easily importable into statistical software for analysis

The Excel / SPSS / Excel / PPT Shuffle:

  • Extract the data from disparate platforms, typically in Excel format.
  • Load into SPSS / R
  • Clean it
  • Analyze it (real analytics, hypothesis based statistical tests)
  • Consult with others, looking for spurious variables and potential actionable angles
  • Write out your story
  • Export the data from SPSS / R into Excel
  • Create pretty charts
  • Load the charts into PPT
  • Frig around with the damned image settings for too long
  • Edit up your story; tuck and tails
  • Communicate
  • Action it
  • ?????
  • Profit!

This workflow is full on, pants on head, stupid. Concretely:

  • It’s slow
  • It’s prone to a large amount of error

Previous attempts to change this script:

2008, SPSS viewer format output for your report.
Advantage: It’s fast, more accurate, and you see what we see.
Disadvantage: it’s ugly.
Feedback: Oh god, my eyes, they burn they burn. Go back to Excel. Go back to Excel. (pained sobs).

2009, SPSS viewer format output for your report.
Learned a few new prettification techniques to make it a bit more attractive.
Advantage: It’s fast, more accurate, and you see what we see.
Disadvantage: it’s ugly.
Result: Oh god, my eyes, they burn they burn. Stop it. Go back to PPT. Go back to PPT. (sobs).

2010, SaaS format for your report.
Advantage: It’s pretty, near real time, and has a memory.
Disadvantage: It’s taking too long.
Result: Oh god, my budget, kill it! Kill it with fire. (sobs).

2011, Advanced dashboard SaaS format for your report
Advantage: It’s far more functional, is much faster, and has a data dictionary.

Result: The SaaS format is tractable.

I have a huge problem with Excel

Do you see as many problems with it as I?

***

I’m Christopher Berry.
I tweet about analytics @cjpberry
I write at christopherberry.ca