Two intersecting themes for you today – attribution and decision making. This paper from Google Analytics and eMarketer really got me started, and you can download it here. It’s a survey of marketers and agencies (n=179) gauging attitudes, expectations, and objectives in attribution. Which is so hot right now. Thank you, Google. Great stuff. There’s a big difference between satisficing decision making behavior, and optimizing decision making behavior.Satisficing decision making behavior is characterized by: Good enough because it’s good enough. If it ain’t broke, don’t fix it. We only really have to do the minimum to satisfy our expectations. Optimizing decision making behavior is characterized by: Searching for better. Thinking forward and thinking backwards. Seeking to maximize an objective. People[…]
Many of my good friends and successor staff are off to build their own analytics or marketing science practices abroad. Some are going technology side. Two are going agency side. I’m happy for all of them. Those companies chose wisely because they’re all starting off with a solid keel. Here’s what I tell them: On Hiring Your first hire is critical and most strategic – choose somebody with strengths that complement your weaknesses. You’re building an orchestra: each ones strengths will have incrementally complement each others weaknesses, when you get to a team of 4 or 5, hiring strengths to reinforce existing strengths really pays off. (Architect your orchestra.) If you can’t eat with them or have a beer with[…]
There are a few spurious variables to look for in digital analytics. There’s a whole world out there outside of a website. And it’s not just social. Macroeconomic factors impact marketing performance. Consumer trends impact marketing performance. Competitor marketing activity impact marketing performance. More data than ever is open, and available to you: The best trove of US macroeconomic data can be found by at FRED, which also has an excellent API. The best trove of US consumer trends (and seasonality) can be found at Google Trends. The best (free) trove of competitor display ad activity can be found at moat.com To address some doubt: You don’t need a degree in economics to know how to read definitions and examine[…]
Obama pledged $200 million for big data R&D. Winners: Berkeley, which is getting $10 million for Machine Learning and crowdsourcing research. Earth Scientists, who are getting funds for advanced earthquake prediction. Biomedical researchers, who are getting funds for more genome storage space. Losers: Your grandchildren. “Meanwhile, the Department of Defense is investing $250 million annually on new ways to “make truly autonomous systems that can maneuver and make decisions on their own.”” We’re closer to ever to realizing our collective fantasy. *** I’m Christopher Berry.I tweet about analytics @cjpberryI write at christopherberry.ca
A spurious relationship is when it appears that X causes Y, but in reality, there’s some other variable, W, that causes X and Y, or alternatively, X causes W which causes Y. Variable W is lurking out there, hidden, messing up your understanding. And, sometimes, Y is actually causing X. In that instance, it’s the modeler who has made a specification error. They mistakenly specified which one was the dependent variable. This gives rise the meme “correlation isn’t causation!” iPad and Conversion A curious fact emerged a few days ago. Somebody noted how the conversion rate from their iPad catalog was double the website average. The writer didn’t outright claim that the iPad device caused conversion to double. Rather, they[…]
The previous two parts explained what a Key Performance Indicator is, and the cause of KPI Creep. How do Data Scientists Cope? Data Scientists are frequently confronted with datasets that contain thousands of variables. If we tried to to understand the relationship of everything against everything using the methods at our disposal, we’d fail. Data Scientists don’t say, “we want to understand everything”. We know we can’t. We would fail because: There’s too much complexity for a single human to understand. There’s no way to tell a coherent story. There’s no recommendation that would mean anything to anybody. The Data Scientist copes by optimizing for a single variable. In every step of their work, they focus on a single optimization[…]
Yesterday, I defined what a KPI is, and explained the existence of KPI creep. A List of KPI’s Is NOT a System or a Model A list of KPI’s, 10 to 15 in a young programme, is not a system or a model. Take, for instance, the following list for a standard eCommerce shop: Net Revenue Gross Revenue Conversion Rate Number of Conversions Average Revenue per Checkout Number of Checkouts Started Checkout Abandonment Ratio Number of Carts-Started Number of Carts-Abandoned Cart Abandonment Ratio Average Items Viewed per Visit Number of Visits That list, unto itself, does not constitute a system of thought. If I turned this on its side, and drew arrows between the factors, then yes, it would constitute[…]
There’s a problem with Key Performance Indicators (KPI’s) as the general religion of Digital Analytics. Specifically: (1) On their own, a list KPI’s does not constitute a system of thought or logic. (2) A list of KPI’s is extremely difficult to optimize against as a whole. Statement (1) compounds the problems in Statement (2). Definition of KPI: Let’s go back to the 2007 Web Analytics Association Standards document for the definition: KPI (Key Performance Indicator) — while a KPI can be either a count or a ratio, it is frequently a ratio. While basic counts and ratios can be used by all Website types, a KPI is infused with business strategy — hence the term, “Key” — and therefore the[…]
You should read the full post, entitled “How Reddit algorithms work“. It’s a great read. TL;DR: Reddit harnesses the power of the recency logarithm in its ranking algorithm. Reddit weighs the first ten upvotes equally to the next 90 upvotes. Reddit values Recency above all else. The logarithm in the Reddit algorithm favors early pick up. Getting 10 upvotes is no easy task. Getting the next 90 is as hard. It’s the same math used to express the strength of earthquakes. Reddit’s bias/variance towards recency has a cost. High frequency/low recency power users are prone to scream ‘repost’ on a four hour old article. Casual redditors, who are slowpokes, are more likely to be annoyed with power users, who cause[…]
If you don’t know about Octave, it’s new to you. Octave is a high level language and real time compiler, all nicely contained into a single package. It’s free and well documented. I use it to: Run functions against relatively small datasets. Rapidly visualize complex functions fitted against that data. Answer questions that I’m not going to have to answer repeatedly. I use Octave to prototype and check functions quickly, and move on into the heavier languages to the repeatable bits. If you don’t have SPSS or the upfront patience for R, Octave is a pretty good quick start language. Check out Octave. *** I’m Christopher Berry.I tweet about analytics @cjpberryI write at christopherberry.ca