Deriving structure from the subjectively unstructured isn’t easy, but the results can be extremely useful and rewarding. In particular, the thing about ourselves, as people, is that deficits in our own knowledge can bring about particularly bad biases. These biases are corrected for in traditional content analysis, in that a small group of people train each other to come to an agreement. That’s not the case in larger groups. Yesterday, you heard about “Unskilled and Unaware of It: How Difficulties in Recognizing One’s Own Incompetence Lead to Inflated Self-Assessments” (Kruger and Dunning, Journal of Personality and Social Psychology, 1999 (77), 6, pp. 1121-1134.), which you should absolutely read. They took unstructured jokes and converted them into a dependent variable. Why?[…]

It is possible to derive quantitative measures of subjective concepts from volumes of unstructured data. Some in political science use content analysis to quantify media or message bias. Some in media studies or public relations use a variant of content analysis to measure bias. Some in data science use a crowdsourced variant of content analysis to increase the number of features on unstructured data. Let’s have some fun. In what has become a staple paper “Unskilled and Unaware of It: How Difficulties in Recognizing One’s Own Incompetence Lead to Inflated Self-Assessments” (Kruger and Dunning, Journal of Personality and Social Psychology, 1999 (77), 6, pp. 1121-1134.), which you should absolutely read, the authors use a type of content analysis to quantify[…]

Yesterday, you met SnowPlow, a new open source stack for web analytics. Hopes for SnowPlow: That it will expose more web analysts to machine learning experiences with WEKA, MaHoot, R, or python That it will reduce the reporting efforts and increase data latency That it will drive better predictions and stronger optimization routines Hopes for the Social Stack Remember this? (You can read the full post about Personal Knowledge Systems if you’re curious). Stacks have powerful effects on the experiences that teams have. Tools alone aren’t guaranteed to generate amazing experiences, and incredible results. The real determinant is how those with great stances use those tools to realize those results. Tools are an intervening variable. There’s a fairly heated debate[…]

Reddit banned linking to The Atlantic and Physorg, among others, in an effort to fight spam. (You can read more about the specifics here.) Is banning an admission that the voting algorithm is broken? This is a huge issue in social right now. Let me explain: If you’re a publisher of niche content, in particular, content that appeals to males 18-34, you can ill afford to ignore Reddit. Reddit directs a lot of traffic to other websites. There’s a monetary incentive to rig the system. Everybody gets an equal vote on Reddit. It’s a very democratic voting system. Democracies spawn interest groups. On Reddit, groups of people can form voting rings. If you want to get your link to the[…]

There are two great stacks in modern analytics: the technology stack and the social stack. The technology stack includes all the tools you’re going to invest in to shape the experiences you (and your team) will have. The social stack includes all the tools and people you’re going to invest in to shape the experiences you (and your team) will have. Constructing both stacks have benefits and drawbacks. They always have consequences and always carry diseases with them. They are always subject to their own peculiar form of lock-in that’s far more resistant to change than you might suspect. The good people at Keplar LLC have put together a neat take on the web analytics stack. Quoting from them: “What[…]

The design pattern phrase: “__________ for people like you” Is powerful and elegant. Behold: “The Unemployment Rate For People Like You“ Clicky Clicky. It’s worth it. *** I’m Christopher Berry.I tweet about analytics @cjpberryI write at christopherberry.ca

What influences your customers to buy?  We watched last week as, again, the attribution debate come back to the surface. (You can read a five part series on the subject from April 2012.) First-click attribution to time-decay attribution to last-click attribution to algorithmic smoothing, OH MY! All of them are techniques to assign value to something that happened, and, that has uses. All models have their uses, and all models, by definition, are too small to fit all the complexity. Understand the uses and use them to set expectations that make sense in that context. That aside, a set of independent variables in the attribution debate that may be more predictive are the actual influences on customer purchase. In your[…]

Microsoft was going to ship Internet Explorer 10 with Do Not Track turned on by default. The industry reacted negatively. As a result, IE 10 will not ship with Do Not Track turned on by default. The key principle is: “An ordinary user agent MUST NOT send a Tracking Preference signal without a user’s explicit consent.” Argument If a browser is sending out a do not track signal by default, it is argued, then the signal becomes meaningless and will be disregarded, as the user didn’t actually opt out of tracking. They didn’t make that choice. Why isn’t Do Not Track turned on by default? The HTTP cookie has been turned on by default since 1994, when Lou, an engineer[…]

The Data Journalism handbook from O’Reilly is good stuff. The rise of data journalism is really great news. I’m seeing great things in Toronto. Like this piece. And this piece. And this piece. Granted, some of it is good old fashioned access to information investigative journalism. The sophistication of the analysis has increased. So has the sophistication of their mediums. There’s a lot that digital analytics practitioners can learn from journalism. Journalists have several hundred years on all of us in terms of communicating with the public. They have to write popular things to stay in business. They’re master framers of public opinion. There’s a lot to be learned there. Journalists don’t communicate in excel or powerpoint. Their mediums are[…]

Fisher and Pry put forth “A Simple Substitution Model of Technological Change” in 1971 (Technological Forecasting and Social Change, 3, 75-88). (The paper is not publicly available, for free, at the time I wrote this post.) It could be useful in explaining the gap left behind by Bass, and the downsides of all the charts I showed Monday. I like this paper. Here’s the first paragraph: “For people who attempt to forecast the future, there is a continuing need for simple models that describe the course of unfolding events. Each such model should be based upon easily understood assumptions that are not available for unconscious or invisible tampering by the forecaster in his efforts to make the future what he[…]