One of the amazing things about R as a programming language is just how many libraries there are. The ggplot2 library is one of them. And, so far, I like it. You can download the package here. (You should download the latest version, as version 0.9.1 is quite upgraded from 0.8.9*). I’m using data pulled from the EIU Global Growth Forecasts from Buzzdata. I used Hadley Wickham’s Had.co.nz site as as reference. (Thank you!) This code: Reliably generates this chart: Just a subtle point to make about one-off speed versus repeatability. If the underlining data changes, as forecasts frequently are, this chart can be replicated as quickly as a file can be downloaded and re-ran. A PNG is generated rapidly.[…]
Category: Uncategorized
There is a difference in perception between what is commercial spam and what is relevant to a community of interest. There is a difference in perception between the good that a brand can do for a community, what a person can do for a community, and how both can do harm. Competition amongst branded accounts is a root cause of some persistent commercial spam. There is a difference between relevant commercial messaging, persistent repetition commercial messaging (spam), and community contribution. Reasonable people can disagree about what constitutes persistence, and, if content milling and syndication really constitutes link-spam. Newcomers to analytics will indeed find great utility from content-mill posts. Established professionals derive less utility. What’s spam and what isn’t depends on[…]
Deriving structure from subjectively unstructured data using straight content analysis with crowdsourcing methods may suffer from unintended methodological bias owing to competency (or even polarity) within crowds. Question: Which late night comedian monologue is funnier and why? Assume a corpus of 5,000 timestamped audio* quotes drawn from episodes of Craig Ferguson, Conan, Jay Leno, Jimmy Kimmel, and Letterman. (Assumption: the quotes are drawn randomly from the most recent year, and that a random selection is representative of the total performance.) Assume the dependent variable is ‘funny’. Assume the independent variables are show, day, time-into-show, duration-of-clip, sight-gag, and pause duration. Some machines are excellent at analyzing audio files, and all those independent variables are coded without methodological bias. The machines work.[…]
Deriving structure from the subjectively unstructured isn’t easy, but the results can be extremely useful and rewarding. In particular, the thing about ourselves, as people, is that deficits in our own knowledge can bring about particularly bad biases. These biases are corrected for in traditional content analysis, in that a small group of people train each other to come to an agreement. That’s not the case in larger groups. Yesterday, you heard about “Unskilled and Unaware of It: How Difficulties in Recognizing One’s Own Incompetence Lead to Inflated Self-Assessments” (Kruger and Dunning, Journal of Personality and Social Psychology, 1999 (77), 6, pp. 1121-1134.), which you should absolutely read. They took unstructured jokes and converted them into a dependent variable. Why?[…]
It is possible to derive quantitative measures of subjective concepts from volumes of unstructured data. Some in political science use content analysis to quantify media or message bias. Some in media studies or public relations use a variant of content analysis to measure bias. Some in data science use a crowdsourced variant of content analysis to increase the number of features on unstructured data. Let’s have some fun. In what has become a staple paper “Unskilled and Unaware of It: How Difficulties in Recognizing One’s Own Incompetence Lead to Inflated Self-Assessments” (Kruger and Dunning, Journal of Personality and Social Psychology, 1999 (77), 6, pp. 1121-1134.), which you should absolutely read, the authors use a type of content analysis to quantify[…]
Yesterday, you met SnowPlow, a new open source stack for web analytics. Hopes for SnowPlow: That it will expose more web analysts to machine learning experiences with WEKA, MaHoot, R, or python That it will reduce the reporting efforts and increase data latency That it will drive better predictions and stronger optimization routines Hopes for the Social Stack Remember this? (You can read the full post about Personal Knowledge Systems if you’re curious). Stacks have powerful effects on the experiences that teams have. Tools alone aren’t guaranteed to generate amazing experiences, and incredible results. The real determinant is how those with great stances use those tools to realize those results. Tools are an intervening variable. There’s a fairly heated debate[…]
Reddit banned linking to The Atlantic and Physorg, among others, in an effort to fight spam. (You can read more about the specifics here.) Is banning an admission that the voting algorithm is broken? This is a huge issue in social right now. Let me explain: If you’re a publisher of niche content, in particular, content that appeals to males 18-34, you can ill afford to ignore Reddit. Reddit directs a lot of traffic to other websites. There’s a monetary incentive to rig the system. Everybody gets an equal vote on Reddit. It’s a very democratic voting system. Democracies spawn interest groups. On Reddit, groups of people can form voting rings. If you want to get your link to the[…]
There are two great stacks in modern analytics: the technology stack and the social stack. The technology stack includes all the tools you’re going to invest in to shape the experiences you (and your team) will have. The social stack includes all the tools and people you’re going to invest in to shape the experiences you (and your team) will have. Constructing both stacks have benefits and drawbacks. They always have consequences and always carry diseases with them. They are always subject to their own peculiar form of lock-in that’s far more resistant to change than you might suspect. The good people at Keplar LLC have put together a neat take on the web analytics stack. Quoting from them: “What[…]
The design pattern phrase: “__________ for people like you” Is powerful and elegant. Behold: “The Unemployment Rate For People Like You“ Clicky Clicky. It’s worth it. *** I’m Christopher Berry.I tweet about analytics @cjpberryI write at christopherberry.ca
What influences your customers to buy? We watched last week as, again, the attribution debate come back to the surface. (You can read a five part series on the subject from April 2012.) First-click attribution to time-decay attribution to last-click attribution to algorithmic smoothing, OH MY! All of them are techniques to assign value to something that happened, and, that has uses. All models have their uses, and all models, by definition, are too small to fit all the complexity. Understand the uses and use them to set expectations that make sense in that context. That aside, a set of independent variables in the attribution debate that may be more predictive are the actual influences on customer purchase. In your[…]