We’re doing a little experiment on Twitter around the #msure hashtag. To understand the reason for #msure, you need to understand our problems with #measure. Concretely: Bi-hourly ‘reminders’ of an event that’s going to happen two weeks from now Bi-daily ‘I’m recruiting an analytics ninja with twelve years of Google Analytics’ messages Live tweeting throwaway lines like “Leverage data to optimize your business” Traffic generation spam top 10 tricks, top 7 ways to increase clickthrough rate, top 8 derp for maximum herp lists Random people using strange hashtags that are related to analytics I don’t expect #measure to change. I’m not expecting #measure to change. I’m no position to ask #measure to change. I think a lot of people like[…]

Yesterday, Ashlie asked “what kinds of laws social media analytics is borrowing?” One of the great tree trunks in Marketing Science is called the Bass Diffusion Model. It was published in 1969 by Frank Bass, in a paper entitled: “A new product growth model for consumer durables“. (Management Science 15 (5): p215–227)Briefly: There are innovators and there are imitators. Each segment, innovators and imitators, adopt a new product at differing rates over time. The impact of Word Of Mouth can be modeled by the coefficient q The impact of Advertising can be modeled by the coefficient p The coefficients cause different product adoption curves The model can be used to predict the impact of word of mouth on product diffusion[…]

Have you seen Beluga Analytics, the open insights platform made available by Grooveshark? It’s pretty sweet. Enter in the name of a band and gain a whole range of data about that band’s audience. This just isn’t interesting. It’s absolutely fascinating. (Statistics ahead! A Z-Score means “how far from the average is a given group”. The higher (or lower) the Z-score, the more unusual a group is.). Let’s compare two very different bands: europop sensation Aqua against Hipster favourite The Flaming Lips. Aqua listeners are female, and disproportionately young. They’re more likely to listen to music while doing chores, hanging out, and surfing. And, hay, that’s something you didn’t know before. Next, check out the Hipsters. Males, 25 to 34[…]

There’s a campaign going on for MiO in Toronto. I didn’t know much about it until Friday, when I saw this commercial: I went to the Loblaws today, and, right at eye level, I saw the MiO. I recognized it. I picked it up. So, the commercial ‘worked’, in that I paused in the powdered chemical section. Big moment of truth. I looked for the nutritional information. After all, the MiO is in a bad neighborhood. Started reading the french side first – nope, not going to read that. Such tiny lettering. English side, great. Water…Citric Acid…propelyne…. Suspicion confirmed. I put it back. I headed off to the gym, and, I wondered what had gone wrong. Here we had a[…]

Is hexagon binning really better than scatterplots? It depends. Why this topic, why now? You may have missed Chris Stucchio’s excellent post entitled “Don’t use Scatterplots” on Saturday. It’s causing quite the ruckus. Naturally. Chris used a provocative title and backed it up with a logical foundation. He showed how hexagon binning generates a more accurate view of reality. What’s the difference? This is a scatterplot: It has an X axis and a Y axis (and sometimes a Z) For every case in the data set, a symbol is used to denote where it is Chris’ main point is that dots overlap when multiple cases share the same point. Some other commenters have stated that this can be adjusted for[…]

Excel continues to be THE major tool in analytics. It shouldn’t be. Excel: Does not scale beyond a single computer, and frequently fails to load with very large data sets Does not contain separate model, controller, and viewer modules unless completely forced Allows human beings to make too many big mistakes; too much error On the other hand, Excel: Is easy to use Creates pretty charts that, with effort, can be dragged into PowerPoint presentations Is shareable Is cheap Is fast (compared to building something accurate or scalable) There are broader problems with Excel, namely: They’re prone to complexity creep Engenders disrespect (After all, it’s just pizza and spreadsheets, derp) Are generally not easily importable into statistical software for analysis[…]

Wolfram announced SystemModeler yesterday. You can read the post here. You can see the costs here. Pros: It’s integrated right into the predictive stack It looks a hell of lot prettier than all the other System Modelers out there on the market It’s priced right for students Cons: 99% of the potential users won’t use it because they have to take a course It’s priced well outside of the innovator-technologist range It looks complicated I predict a whole sequence of ‘game changer’ posts are about to overtake us all. It isn’t. But that’ll make for some pretty good title-muppeting. The product is a great extension for a pretty good stack. It looks cool. It may be of interest to many[…]

Can I ask about what you think of #measure, which is the main analytics channel on twitter? Are you happy with what that channel has become? *** I’m Christopher Berry.I’m taking refuge over at #msure.I tweet about analytics @cjpberryI write at christopherberry.ca

On occasion, I use data generated from surveys to experience some empathy with groups of people who I don’t frequently encounter and interview about their realities. A beautiful, publicly available data set is the PEW Internet and American Life Project’s August 2011 Apps and Adult SNS Climate data set. You can access the dataset here. Thank you PEW. You’ll see the tables as I see them. I’m using column percentages. Note that there’s a confidence interval on either side of those percentages. If you don’t understand what a confidence interval is, don’t tweet about the figures you’re seeing. Don’t quote them. We’ll get to that in a bit. (In general, you don’t use tables to communicate with general audiences. Since[…]

It’s awesome to watch Pinterest grow. A post at High Scalability reveals just how much they’ve grown. TL;DR: 80 million objects stored in S3 with 410 terabytes of user data, 10x what they had in August. EC2 instances have grown by 3x. 150 EC2 instances in the web tier 90 instances for in-memory caching, which removes database load And a few notes about technology that caused a smile: Written in Python and Django  Hadoop-based Elastic Map Reduce is used for data analysis and costs only a few hundred dollars a month One of the fastest growing sites in history. Sites AWS for making it possible to handle 18 million visitors in March, a 50% increase from the previous month, with[…]