Here’s what you need to know about automated statistical analysis: 1. Automated statistical analysis is not a substitute for good judgement Statistical tests are tools. They help us understand why nature is the way that it is. Nature resists being known about. But, she is knowable. Statistical tests themselves are part of nature. The tests themselves were never meant to be substitutes for good judgement. That belief, that tests could replace people, has only ended up causing the accumulation of some pretty outrageous assumptions over the years. Just because there is a significant correlation between Magnum Ice Cream sales and Piracy in the Indian Ocean doesn’t mean that it’s causal. Statements of causality require judgement. Automated statistical analysis is not[…]

Let’s take a look at what 16-bit interfaces could do. A great simulation game begins with just a handful degrees of freedom and explodes from there. Behold the grandeur that is SimCity for the Super Nintendo. If you’re familiar with SimCity (1991), skip ahead to Data Exploration, below.       On a flat plane of pixels, you have the choice to: Bulldoze a feature. Build a road. Build a mass transit unit. Build a power line. Build a park. Build a residential zone. Build a commercial zone. Build an industrial zone. Build a police department. Build a fire department. Build a stadium. Build a port. Build a coal plant. Build a nuclear plant. Build an airport. Build a special reward building.[…]

Discovering truth in data always begins with you, and your judgement. Assume that you have some idea about the world. Something that you believe is true, and you want to discover if you’re right. Here’s how I draw out that out. It becomes a matter of organizing a dataset along those thoughts. I call causal variables X1, X2, X3… I call the single variable that I’m trying to explain the Y variable. There can be only one Y variable. For your own sanity, there can only be one Y variable at a time. There are a large number of tasks to figure out if X1, X2, X3 cause Y. One of them is to run any one of the many[…]

The full New York Times Innovation Report was leaked last week. It’s worth reading if only because it lets you look at a paradigm – an entire way of thinking, laden with it’s own explanations of culture, causal factors, jargon, assumptions, myths, systems, and heretics. It enumerates the preferences and aspirations of a small group of people (including their preferred org-chart re-org!) and highlights a long-standing tension between technologists and journalists. It may also serve as a wake-up call that continuous improvement and scientific management is already a reality at several disrupting media startups. Let’s begin. Summary if you didn’t read it (and won’t): The document contains 97 pages. The term “Competitor” is mentioned on 39 of those pages. Analytics[…]

This piece from McKinsey highlighted the inflated expectations of big data analytics – “…expectations of senior management are a real issue…but too often senior leaders’ hopes for benefits are divorced from the realities of frontline application. That leaves them ill prepared for the challenges that inevitably arise and quickly breed skepticism.” The listicle (et tu, McKinsey?) summarized below, is somewhat related to that concern: 1. Data and analytics aren’t overhyped—but they’re oversimplified 2. Privacy concerns must be addressed—and giving consumers control can help 3. Talent challenges are stimulating innovative approaches—but more is needed 4. You need a center of excellence—and it needs to evolve 5. Two paths to spur adoption—and both require investment (automation and training) In a fit of[…]

There are varying concerns about what constitutes a causal model, the degree to which data is biased, certainty that the model is predictive about the future, and, that the model itself is a truthful depiction of nature. Over the course of the past two weeks I’ve talked with many people about their perspectives – data scientist, developers, technologies, product managers, brand managers, statisticians, consultants, professors, executive producers, and founders. We’ve talked about everything from why analysts and their customers won’t accept narrow models, why it’s far easier to summarize data than it is to describe the relationships in it, and the intractable differences between what is performance reporting and what constitutes an insight. The verdict is not in. There are varying beliefs[…]

The listicle is an amazing communication device. A listicle schema for communication – always in the form of a list. Sometimes that list is random, but, often ordered. I continue to be in awe of the ongoing effectiveness of the listicle. Lists are effective communication devices in analytics. Why not listicles? Lists Effective analytics dashboards are filled with lists. “The top 10 performing landing pages” “The top 5 posts” “The top 7 competitor ads…they don’t want you to know about!” Lists are visually compact and editorial appropriate. An executive might scan a list for the top performers and the bottom performers. An analytics executive might scan a list for the top 20% and verify that it accounts for 80% of[…]

“The End of Facebook” trumpeted the headline. 46 points in 46 minutes on Hacker News. “Facebook Screws Social Media Marketers!” trumpets Business Insider. “Facebook is losing teens” states Global Web Index. Here we go with the bandwagon. Hop on! Only that this time isn’t going to be quite like the last time(s). Teens have fled to their smartphones They’re computers they can control. They’re computers that aren’t tied to the family room, where parents can seen them. Small screens offer a degree of privacy and intimacy that larger screens, even the tablet, just can’t replicate. Facebook saw that a long time ago and snapped up a few cool startups. Ditto Twitter. Ditto Google. And the rest of us are behind[…]

A strategy is a set of choices that, when combined, cause a sustainable competitive advantage. Conscious, reinforcing, choices, are powerful. That’s what you learned in B-school. I’m far more pessimistic that strategic choices are generally conscious. I’ll explain. A set of deliberate choices, that constitute a strategy, might be: Because we chose the same aircraft we save money on maintenance. Because we chose the same aircraft we save money on ticketing. Because we chose the same aircraft we compete exceptionally well on specific flight pairs. Because we chose a large set of direct point-to-point flights without going through hubs, we save money on baggage transfer. Because we simplify baggage, we can turn planes around more reliably. Because we turn planes around[…]

It’s the results, genius! It’s the results. The purpose of any sort of data analytics or data science is to get results. It isn’t about the spreadsheet that comes three weeks after the campaign. It isn’t about sandbagging numbers. It isn’t the few slides in the Quarterly Business Review. It isn’t even data entertainment. It’s the results. Great! So what’s the deal? Why is so much time expended on activities that don’t directly tie to getting results? Analytics Maturity It’s because of maturity, or the sum of experiences that an organization/culture chooses to remember. Very good models of analytics maturity exist. Stephane Hamel has a great one. Stances inform tools and tools cause experiences. Where you stand affects which, if[…]