This is part three in a five part series on Analytics and GIS. Part one focused on a road safety predictive analyst position, and part two focused on scoring algorithms.

Tool Time

ESRI produces a tool called ArcGIS. It uses open source mapping data, and has a fairly sophisticated set of functions.

ArcGIS has an optional tool called Model Builder. GIS analysts use it to score environments. Or, in this specific case, road segments.


Yesterday, I enumerated a sequence of attributes that a road has, ranging from speed limit, neighborhood, size, intersection and crossings. There are also attributes like weather and time of day, which are hiding yet more spurious variables like number of suspended drivers on the road or number of drunk drivers on any given segment. These all may, possibly, explain why a segment is problematic. But what do we mean by problematic?

Modelling works best when there is a single dependent variable.

From a safety perspective, the term ‘hazard’ might be used – the gross amount of damage that a given road segment causes on society. How that concept, ‘hazard’, is built, has a big effect. Civil engineers who specialize in road construction have their own values for a human life. Policy analysts differ on the true cost of a human life, lost productivity due to serious injury, and how to account for health care costs of injury.

It is likely tempting to express a single value or a single count.

Fatalities, compared to injuries, compared to nothing happening, are anomalies. And that poses a very special challenge for the analyst. To a certain extent, resorting to machine learning algorithms may yield an advantage.

Again, I’ll leave it up to the reader to decide if subject matter expertise is really required to model out a hazard score or its predictive components.

Modelling can happen on the GIS layer

It might not actually be news to anybody at all, but modelling is indeed happening on the Geographic Layer, and it looks pretty awesome. I’m in no way saying that ESRI produces the only technology to do that, as I’m confident that logistic companies have already long hyper-optimized their transport systems, but that’s all pretty awesome.

This is part two in a five part series on Analytics and GIS. Tomorrow, we’ll look at the discrimination.


I’m Christopher Berry.
I tweet about analytics @cjpberry
I write at