On Monday we set up a model relating foot traffic to patio attendance and beer revenue for a pub on Toronto’s Peter Street. On Tuesday, we expanded the model to include weather. All equations are fake and are for illustration purposes only.
A Concrete Example
- X1 is the number of people walking past a patio on Peter Street.
- X2 is the number of people who are sitting on the patio, drinking a beer, on Peter Street.
- Y is beer sales for that pub operating the patio on Peter Street.
- Y = 1250 + 0.05 * X1 + 18.22 * X2
- W1 = ((c0 (temperature) + c1 (humidex) + c2 (sky) + c3 (precipitation)) / (clout denominator))*100
- Y = 1115 + 7.22*W1 – 0.14*W1^2
There’s no shortage of factors that might make a difference. In fact, each one of those variable can be further subdivided into ever more specific, granular, variables. The number of features isn’t so much of a problem inasmuch as our ability to identify the most relevant variables and focus on them.
Other factors include:
X3 is the presence of a Jays playoff game at home
X4 is the presences of a Jays game
X5 is the number of weekly return patrons
X6 is the number of weekly first-time patrons
X7 is the presence of a drink special
X8 is average weighted price
X9 is the amount of money spent on a radio advert
X10…X16 is the day of the week
X17 is the leading day before a long weekend
X18 signifies closed
X19 is number of servers who showed up that day
X20 is the bar next door’s drink special…
…and so on.
Some of these are controllable. Some of these are not. And, the existence of collinearity creates a massive issue in terms of stating, for certain, that a given marketing treatment is really driving it.
If drink special and Jays games always happen in tandem, the analyst has a very real problem. How much credit is assigned to the Toronto Blue Jays, who have a stadium just down the street, and how much credit assigned to the drink special? If the marketer never tried having a drink special without a Jays game, how can the real impact be truly assessed?
The choices that people make themselves can be a huge cause of collinearity.
And other times, when dealing with very complex systems, like marketing, we have to resort to ‘all other things being equal’ type reasoning.
How Some Bridge Tournaments Are Played
Every round of bridge is unique because there is some crazy huge total number of games possible. Bridge is played with a deck of 52 cards split amongst 4 people. There are several thousands of variables imaginable, and several trillion variables possible. With so many variables (trillions), it would be hard to find a champion, somebody who’s skill determined their victory, not blind luck.
The tournament organizers create a whole bunch of hands and make those the standard ones at each table. After each game, the participants move onto the next table. The organizers re-order the hands to represent the starting state at that table. Everybody stays in the tournament until the end, all having the exact same starting experience. The person with the highest spread against everyone else’s score wins. This is the ultimate way of saying “all other things being equal”.
These guys use an extreme form of simulation to assert that the champion was the one who optimized the cards they were dealt the best. It’s the ultimate “all other things being equal” statement in the face of an extremely difficult attribution challenge.
And yet, with very few exceptions (the control group), it is exceptionally hard to replicate outside a pure laboratory environment. (Ie. The real world).
The Way We Understand Cars
Instead of breaking down a car and asking “what’s the return on investment of my tires”, maybe a better statement is “inflating your tires properly causes return on investment to increase 3%”.
I don’t even know how to answer the first statement at all. Tires are as essential to automobiles as a message is to marketing.
I sure know how to make the second statement.
I think we all understand, that keeping everything else constant, what is the likely effect of this one variable (Either W or X) on Y, may be the most understandable path forward.
The Cards You Were Dealt
That very nature of nature makes it extremely difficult to break down a complex system and apportion credit to individual components.
These three questions are essential if Marketers are ever to understand attribution:
- What was the state of the factors you couldn’t control?
- What lever did you pull?
- What was the outcome?
Real attribution takes some time, in part because how the marketer works their levers may not vary sufficiently over time to really understand what’s going on. And, in part, in some cases, the underlining uncontrollable factors may be causing massive changes in how the marketer works the levers.
It’s a living system because the marketer is working it, and the individuals targeted are always working it.
Understanding The Problem and The Solution
Even when one focuses on a single optimization objective (Y), difficulties inherent in the way that statistical laws work (the independence assumption, VIF, and R^2) cause severe issues. These, unto themselves, aren’t insurmountable. They require a new type of storytelling.
Optimizing a living system might not only require focusing on a single Y, but also, in the analysis of single independent variables while making assumptions.
I’m Christopher Berry.
I tweet about analytics @cjpberry
I write at christopherberry.ca