This is part two in a five part series on Analytics and GIS. Yesterday, we looked at a job posting for a road safety predictive analyst.

Scoring Algorithms

Collisions, injury, serious injuries, and fatalities happen in a time and in a space. Both the attributes of the space, and the attributes of the space at the time can be recorded and understood.

It is always preferable, when executing any optimization program, to optimize for a single variable. Scoring is one way that we take a whole bunch of factors and derive a single figure.

Road segments have attributes. Is it an intersection? Is it two lanes? Is there parking? Is there a bike lane? Does it twist? Does it have a crosswalk? Is it in a residential area? Industrial? What is the congestion? What is the age distribution of those who live around it? Work around? Go to school around? What is the speed limit? How many tickets are issued? How many residents have complained about it?

There are attributes about roads that are fundamental to how they have been built and the environments they are in, including the sociodemographic of those who use the roads, and those who live near them. You’re looking at a whole system here.

Road segments have attributes that vary by time. Weather conditions and time of day are the two most obvious. That last one, time of day, is particularly important when one considers just how many of those with suspended licenses are on the roads.

Collisions themselves have attributes. Was it car on car, car on bike, car on pedestrian, or bike on pedestrian? What were the ages the drivers, the victims, and the victim drivers? Was anybody impaired? Were any of the drivers suspended? What was the speed? What was the count of fatalities? Of serious injuries? Of injuries? What was the property damage? Was insurance fraud suspected? Was insurance fraud prosecuted? Was anybody found guilty of insurance fraud?

In this way, you can generate a score that is predictive of aggregate collisions, injury, serious injury, and fatalities. If you can predict the relative danger, evidence-based decisions can be made on where to focus limited resources. That is to say, in Canada at least, a fairly large population of people have to die along a very specific stretch of highway, or an intersection, before politicians are motivated to expend resources to improve the engineering of a specific tract.

Such solutions are focused on physical engineering only, not the social in context of a given road segment.

I’ll leave it up to the reader to decide is subject matter expertise is required to generate such a predictive scoring mechanism, or, if a machine learning algorithm would be capable of producing much more accurate predictions.

This is part two in a five part series on Analytics and GIS. Tomorrow, we’ll look at model builder.

***

I’m Christopher Berry.
I tweet about analytics @cjpberry
I write at christopherberry.ca