Do normative statements cause harm to analytics programs in the long run? This may be a bit meta because I’m talking about the effect that an activity that analysts do every day has on what gets selected to study. A normative statement expresses a value judgement. Consider the following three statements: The strawberry campaign contributed to the acquisition of 1000 new customers out of the 10000 acquired last month. The strawberry campaign only contributed to the acquisition of 1000 new customers out of the 10000 acquired last month. The strawberry campaign failed, only contributing to 10% of new customers acquired last month. Which is the most normative? Consider the next three statements: The strawberry campaign contributed to the acquisition of[…]

Claude C. Hopkins wrote a book in 1923 entitled ‘Scientific Advertising‘. It’s still in print. It’s worth paying attention to, as he already made all the mistakes we’re making. (So why pay twice?) “The competent advertis[er] must understand psychology. The more he knows about it the better. He must learn that certain effects lead to certain reactions, and use that knowledge to increase results and avoid mistakes…We learn, for instance, that curiosity is one of the strongest human incentives. We employ it whenever we can. Pufed Wheat and Puffed Rice were made successful largely through curiosity. Grains puffed to 8 times normals size. “Foods shot from guns.”… A department store advertised at one Easter time a $1,000 hat, and the[…]

Adblock Plus is a browser add-on. It allows users to block display ads. It comes with a pre-loaded block list, and, it allows users to create white-lists and black-lists. It promises ‘annoyance-free web surfing’. Here are the usage statistics from the Mozilla install base: (Roughly 15,000,000 daily users, note the tell-tale elephant function, it’s mass. Also note instrumentation tracking loss on at least two days.) As you can see below, they’re not all from the anglosphere. There’s very high representation from German, French, Russian, and Polish Mozilla language settings. The vast majority of the install base uses Microsoft Windows systems. Why do people use Adblock Plus? They did a survey in late 2011 to find out. ~75% said distracting images[…]

With few exceptions, everybody has a personal knowledge system. I use the term as Roger Martin meant it in “The Opposable Mind” (pp. 103). It’s all summed up in one nice little diagram. Behold: The text summary is: Your stance, “Who am I in the word and what am I trying to accomplish”, guides what tools you use. That is, “With what tools and models do I organize my thinking and understand the world?”, often guides experiences: “With what experiences can I build my repertoire of sensitivities and skills?”. Likewise, your experiences inform your tools which inform your stance. Tools I’ve railed against definition by tools. Again. Again. And Again. You are more than the sum of the tools and[…]

From Emanuel Derman’s book “Models. Behaving. Badly.” and the Modeler’s Hippocratic Oath:  “I will remember that I didn’t make the world, and it doesn’t satisfy my equations.” “Though I will use the models I or others create to boldly estimate value, I will always look over my shoulder and never forget that the model is not the world.” “I will make the assumptions and oversights explicit to all who use them.” (p. 198) Models are useful, but they’re not the world. *** I’m Christopher Berry.I write at

There’s an interesting article from The Atlantic about a patent that Apple acquired earlier this year. As with all things in patents, the devil is buried very deep in the details. The idea of the patent is pretty interesting. The best way to protect your identity is to have many. The best place to conceal a fake identity is between two truths. The ideas is: Create multiple fake identities Read components from your real identity Meld the fake identities with your real identity Fake those identities, and make them do activity, across a network This is novel, especially at the dawn of the mass-bot and recommendation engine era. It might really confuse systems and cause overall degradation of content. It’s[…]

One of the amazing things about R as a programming language is just how many libraries there are. The ggplot2 library is one of them. And, so far, I like it. You can download the package here. (You should download the latest version, as version 0.9.1 is quite upgraded from 0.8.9*). I’m using data pulled from the EIU Global Growth Forecasts from Buzzdata.  I used Hadley Wickham’s site as as reference. (Thank you!) This code: Reliably generates this chart: Just a subtle point to make about one-off speed versus repeatability. If the underlining data changes, as forecasts frequently are, this chart can be replicated as quickly as a file can be downloaded and re-ran. A PNG is generated rapidly.[…]

There is a difference in perception between what is commercial spam and what is relevant to a community of interest. There is a difference in perception between the good that a brand can do for a community, what a person can do for a community, and how both can do harm. Competition amongst branded accounts is a root cause of some persistent commercial spam. There is a difference between relevant commercial messaging, persistent repetition commercial messaging (spam), and community contribution. Reasonable people can disagree about what constitutes persistence, and, if content milling and syndication really constitutes link-spam. Newcomers to analytics will indeed find great utility from content-mill posts. Established professionals derive less utility. What’s spam and what isn’t depends on[…]

Deriving structure from subjectively unstructured data using straight content analysis with crowdsourcing methods may suffer from unintended methodological bias owing to competency (or even polarity) within crowds. Question: Which late night comedian monologue is funnier and why?  Assume a corpus of 5,000 timestamped audio* quotes drawn from episodes of Craig Ferguson, Conan, Jay Leno, Jimmy Kimmel, and Letterman. (Assumption: the quotes are drawn randomly from the most recent year, and that a random selection is representative of the total performance.) Assume the dependent variable is ‘funny’. Assume the independent variables are show, day, time-into-show, duration-of-clip, sight-gag, and pause duration. Some machines are excellent at analyzing audio files, and all those independent variables are coded without methodological bias. The machines work.[…]

Deriving structure from the subjectively unstructured isn’t easy, but the results can be extremely useful and rewarding. In particular, the thing about ourselves, as people, is that deficits in our own knowledge can bring about particularly bad biases. These biases are corrected for in traditional content analysis, in that a small group of people train each other to come to an agreement. That’s not the case in larger groups. Yesterday, you heard about “Unskilled and Unaware of It: How Difficulties in Recognizing One’s Own Incompetence Lead to Inflated Self-Assessments” (Kruger and Dunning, Journal of Personality and Social Psychology, 1999 (77), 6, pp. 1121-1134.), which you should absolutely read. They took unstructured jokes and converted them into a dependent variable. Why?[…]