Nate Silver, who ran forecasts for the Five Thirty Eight blog, called the 2012 Presidential Election right. The #datascience crew tweeted #mathwins Nate mixed a lot of time tested methods and ported a novel one over into political science. He talked about the future. He demonstrated the power of predictive analytics. He accomplished a lot. And he did a lot for predictive analytics. I thank for him for that. He took a good risk with his career and with analytics more generally. He won. Pundits lost. And, he’s totally getting laid today. Nate’s work was a beautiful piece of predictive analytics, and, had more work was done in D3 and productization, it would have qualified as full on #datascience. The[…]

Most discussions of statistical bias, in the world of sampling, revolve around the actual randomness of the sampling. Is there a systematic bias in the way the sample is collected: either in terms of those who have been selected to participate, those who opt to participate, and those who chose to answer specific questions. It’s commonly argued that if the sample is biased, you have to throw away the whole data set, because the sample is not representative of the overall population. And, in general, we confine our discussions of bias to the nature of the sampling, or, how summary statistics vary against what is expected. There’s another type of bias that revolves around inferring causality. Statisticians generally don’t enjoy[…]

There appears to be a belief that facts have two sides to them. It makes the: marketing scientist in me smile public policy quant in me rage scientist in me flip the table Jimmies status: rustled. Stories may have two sides Models may have two sides Ideologies have two sides Facts do not have two sides. And yet, there’s been a few folks coming out of the woodwork lately: Jack Welch and his tussle with the BLS Unskewed Polls in the Presidential Race And many, many others.  A fact is defined as something that actually happened, exists, or is reality. We can experience facts first hand, or observe them though instrumentation. In analytics, there are multiple types of instrumentation that[…]

Has Pinterest topped out? You may be familiar with the Bass Diffusion Model. In short, there’s a predictable function that is very effective at forecasting the adoption curves of new products. The trickiest part of using the Bass Diffusion Model is estimating at what point saturation occurs. Saturation just means where the number of adopters levels out. In the image below, growth started decelerating around 4 years in and certainly flattened out at year 7. I don’t have access to Pinterests own analytics tools. So, like you, I have to rely on public, third party, estimations of what’s going on. Alexa is reporting a flattening reach curve. You can see that below. Most web traffic follows what digital analysts call ‘an[…]

We sat in a nice pub as Sandy rolled in to Toronto last night, just a few data scientists and I. “It’s very difficult to express, succinctly, how dashboards aren’t what people have been told they are.” “Well, you know what the problem with dashboards is.. that they’re not presented in 1080p High Definition. We need to get our dashboards in High Definition.” “Or retina display!” “No. In 3D!” “Pie charts will COME, RIGHT, AT, YOU!” And after laughing for about five minutes, our shoulders bowed upon realizing that somewhere out there, regardless of how ridiculous the concept is, some group of finger-clickers are discussing the same idea right now. Only that they’re serious about it. And that retina high[…]

Two pieces of information and one editorial to share coming out of NY Strata. The next release of ggplot2 will be done in D3 and called r2d3 (thanks to Hadley Wickham and team) A few data scientists argued about things that didn’t matter to anybody. Privacy Analytics won big at Strata. Check them out. The ggplot news is great. It’ll help us to produce nicer graphics faster instead of uglier graphics. Privacy Analytics is a great concept. As for the debates – some were intensely important and very relevant. Some of them were just a fight about words and unimportant. There was much more good than bad at this one. Thanks to the organizers for doing such a great job.[…]

There’s going to be a lot more rhetoric next year about decisions, big data decisions, and decision automation. I think we’re going to hear a lot, and, that’ll fall into three big categories of rhetoric. The first is to argue ‘this-changes-everything!’ Panderer’s are gonna pander. And, it’s a really great line to say that it’s paradigm shift. You know, in ten years, farmer’s aren’t going to farm anymore, they’ll have just fields and fields of big data servers and fields and fields of crops. A farmer isn’t really going to be a farmer as we think of them today, but they’ll be much more like a data manager. Every technological disruption brings in a crop of these folk, and that’s[…]

Have you read about Gartner’s 232 Billion Big Data prediction? Wow. Equally amazing – the 4.4 billion dollar social media analytics prediction, by 2016. Wow. Gartner is projecting a 45% per year average growth rate for social media, social network analysis and content analysis from 2011 to 2016. Hype? Unlikely. This is revenue growth in spite of a massive trough of disillusionment coming up. A majority of this growth in the post-trough slope of enlightenment. That’s big growth. You have a choice. You stand on the sidelines barking that there’s no such thing as contagion effect. You can chose to ignore the most recent findings in the August edition of ‘Science’ on network homophily. That’s your choice. Or you can put on[…]

Sarah Lacy wrote a paragraph that resonated: “One of our most popular stories all week has been David Holmes’s report about how Tumblr wants to pay for journalism. And not just cat pictures, re-written press releases, or 300 word snark-fests by junior reporters paid $12 a post. This isn’t another content farm. They want real, actual New Yorker-style long form journalism.” Then she says, that’s great. Who’s going to write it. She describes an upcoming talent cliff in a few years. Our society isn’t generating people capable of long form journalism or storytelling. It’s a great article, and really worth reading. There’s tremendous incentive, during this era, to communicate in extremely tiny units. If a thought can’t be expressed in[…]

Regular readers of this space know about the BLS and all the neat nuances that go along with the data. I wrote a five part series in July on How Americans Live using Bureau of Labor Statistics data. (It wasn’t a very popular series because it wasn’t a popular topic. A few of you liked it.) And then Jack Welch stepped in it this week. He made news. Whether your a Tastycrat or a Fingerlican, as analytics folks, you have to be intrigued by what Jack Welch is saying and how he’s thinking. His second tweet on the topic appeared to suggest that somehow Obama had something to do with modifying the BLS Unemployment report, so as to make the[…]