This is part two in a five part series on Analytics and GIS. Yesterday, we looked at a job posting for a road safety predictive analyst. Scoring Algorithms Collisions, injury, serious injuries, and fatalities happen in a time and in a space. Both the attributes of the space, and the attributes of the space at the time can be recorded and understood. It is always preferable, when executing any optimization program, to optimize for a single variable. Scoring is one way that we take a whole bunch of factors and derive a single figure. Road segments have attributes. Is it an intersection? Is it two lanes? Is there parking? Is there a bike lane? Does it twist? Does it have[…]

The City of Edmonton posted a pretty interesting position last month. The description is so good that it bears repeating in this space. Bolding is my emphasis. Traffic Safety Predictive Analyst Put your superior analytical skills to work in North America’s first and only municipal Office of Traffic Safety. You will be joining the rapidly growing field of urban traffic safety where the application of statistics and predictive analysis is becoming a vital decision support tool in reducing motor vehicle collisions.    Your responsibilities will be: Provide short, medium, and long-term predictions of collisions and/or speeding by considering current and historical traffic safety related data as well as other influential factors, including weather and demographic data Identify, generate and monitor[…]

Post frequency on the analytics focused blog, Eyes on Analytics has increased to daily. In part, this is to solidify the understanding of the frequency-reach curve in blogging, and in part, it’s an attempt to understand where the broader market is at. I’m testing three themes: How to fight nature’s pesky way of inhibiting our ability to make clean causal statements. The importance of imagination in identifying independent variables. The role of evidence in decision making. Simplification of a message is not pandering. However, many pandering statements are deliberate simplifications. If your optimization objective is to gain followers: Post often. Post simply. Post what people want to hear. I’m choosing simplification while avoiding pandering. Let’s see how that unfolds over[…]

I learned quite a few things this week thanks to a lot of our twitter exchanges. Thank you. Collectively, digital analysts do not: Have a standardized method to express causality. Have a standardized method to limit R^2 inflation as a result of collinearity. Have a standardized method to express either in a clear, simple, and concise way. A set of preferred solutions: We should use conceptual frameworks, causal diagrams, or Ishikawa diagrams, to express the relationship among variables. We should check VIF and communicate that figure when reporting R^2. I’m a long ways away from being able to be really brief WRT this problem set. What do you think? *** I’m Christopher Berry.I tweet about analytics @cjpberryI write at

The objective of the series on marketing attribution was to demonstrate how constraints, caused by humans and nature themselves, generate enormous issues in the marketing sciences. Sometimes such issues are trivialized away. “After all, this isn’t exactly rocket science.” Indeed it’s not. Marketing science is harder. Konstantin Tsiolkovsky published the basic physics of how a rocket escapes Earth’s gravity in 1903. Those laws of physics applied in 1957 when Sputnik was launched. They’ll apply the same way again when/if, in 2012, 55 years later, North Korea gets something out into orbit. While the math looks intimidating, it’s Newton, some systems thinking and some calculus. There are engineering difficulties with respect to stress, force, and materials sciences that are not trivial.[…]

On Monday we set up a model relating foot traffic to patio attendance and beer revenue for a pub on Toronto’s Peter Street. On Tuesday, we expanded the model to include weather. All equations are fake and are for illustration purposes only. A Concrete Example Assume: X1 is the number of people walking past a patio on Peter Street. X2 is the number of people who are sitting on the patio, drinking a beer, on Peter Street. Y is beer sales for that pub operating the patio on Peter Street. Y = 1250 + 0.05 * X1 + 18.22 * X2 W1 = ((c0 (temperature) + c1 (humidex) + c2 (sky) + c3 (precipitation)) / (clout denominator))*100 Y = 1115[…]

Yesterday, we did some work on Peter Street. We related foot traffic to patio use, all to predict pub revenue. A Concrete Example Assume: X1 is the number of people walking past a patio on Peter Street. X2 is the number of people who are sitting on the patio, drinking a beer, on Peter Street. Y is beer sales for that pub operating the patio on Peter Street. Assume a dataset and a traditional linear regression – and get the equation (it’s for illustrative purposes only – it’s not real): Y = 1250 + 0.05 * X1 + 18.22 * X2 To which a good friend remarked: “Ha! I got you! I finally got you! What about weather?! You can’t[…]

Check out the Digital Analytics Association (DAA’s) industry compensation scan. The answer to the question “How much do digital analysts make?” is – “It depends”. Real data, provided by IQ Workforce, shows a fairly wide distribution in salaries across cities. The authors even took into account cost of living to derive a top ten list. The best average salary with cost of living factored in? Atlanta.   Brian Thopsey and Casper Blicher Olsen worked very hard on this research committee project, with Amanda Watlington initiating the project and iterating upon it. I thank them for their effort.  It looks great! *** I’m Christopher Berry.I tweet about analytics @cjpberryI write at

A whole range of statistical methods, both traditional and those found in machine learning, assume independence among independent variables. That assumption is pretty important when interpreting the contribution of each variable on the dependent variable (which we call Y). To unpack:We say there there is a high degree of collinearity between X1 and X2 if X1 is highly correlated with X2. It doesn’t matter if X1 causes X2. Or if X2 causes X1. The fact would remain that a change in X1 would lead to a predictable lift in X2. And, that a change in X2 would lead to a predictable change in X1. A Concrete Example Assume: X1 is the number of people walking past a patio on Peter[…]

How do ranking algorithms work? At the highest level: A machine accepts a series of independent variables. A machine interprets those variables. A machine produces an output that is, ideally, predictive of a dependent variable.  The usefulness of an algorithm is in just how predictive the output is of a dependent variable. For instance: The usefulness of the Google Search algorithm depends on how relevant the results are in relation to the query. The usefulness of the Facebook GraphRank algorithm depends on how relevant the results in the news feed are in relationship to the user. The usefulness of the Netflix algorithm(s) depends on how relevant and divergent the recommendations are in relationship to the household. All three companies use[…]