This series originally appeared in Eyes on Analytics on April 16, 2012
The City of Edmonton posted a pretty interesting position last month.
The description is so good that it bears repeating in this space.
Bolding is my emphasis.
Provide short, medium, and long-term predictions of collisions and/or speeding by considering current and historical traffic safety related data as well as other influential factors, including weather and demographic data
Identify, generate and monitor KPIs; review the related performance issues and recommend evidence-based resolutions
Review the traffic safety strategic goals and provide evidence-based recommendations for the goals
Provide leadership, supervision, training and direction to two technicians to ensure that collision data processing is done in a timely manner and meets the preset quality target
Analyze traffic safety related data such as collision, traffic flow, and vehicle speed distribution to identify collision and speeding trends
Provide advanced statistical support for road user behaviour analysis and research, as well as for other traffic safety initiatives like speed management and engineering improvement programs
Make recommendations relating to the City of Edmonton traffic safety performance and predictions based on a comprehensive statistical analysis of various traffic safety-related data
Work collaboratively with other analysts to conduct cross-initiative analysis for various Office of Traffic Safety initiatives such as engineering design improvement, education, roadway maintenance, road user behaviours, fleet safety, and Operation 24 hours
Coordinate responses to collision data inquires
Represent the Office of Traffic Safety by presenting research findings in professional, public and educational settings
Work collaboratively with our stakeholders and research associates supporting the four elements of traffic safety: Evaluation, Education, Engineering, and Enforcement
Routinely examine and refine the skill sets to find ways of improving the work quality
What’s so special?
This is applied analytics for the purposes of producing much better societal outcomes. It’s a practice that’s pretty rare. And, I rarely see the terms ‘evidence based’ used so often.
At the time of writing, there have been 11 fatalities so far this year in Edmonton.
I’m extremely excited to see a municipality take this on. And, because we know that analytics have a concrete effect on goals, priorities, and decision making, this position could contribute to saving the lives of at least dozens.
Whoever was hired for this position will have to make use of Geographic Information Systems and mash them up with several traditional analytics approaches, including modelling and scorecarding. While unlikely in Edmonton, they may encounter ethical issues around discrimination. We’ll take a look at the methods, tools, and considerations this new hire may encounter.
Collisions, injury, serious injuries, and fatalities happen in a time and in a space. Both the attributes of the space, and the attributes of the space at the time can be recorded and understood.
It is always preferable, when executing any optimization program, to optimize for a single variable. Scoring is one way that we take a whole bunch of factors and derive a single figure.
Road segments have attributes. Is it an intersection? Is it two lanes? Is there parking? Is there a bike lane? Does it twist? Does it have a crosswalk? Is it in a residential area? Industrial? What is the congestion? What is the age distribution of those who live around it? Work around? Go to school around? What is the speed limit? How many tickets are issued? How many residents have complained about it?
There are attributes about roads that are fundamental to how they have been built and the environments they are in, including the sociodemographic of those who use the roads, and those who live near them. You’re looking at a whole system here.
Road segments have attributes that vary by time. Weather conditions and time of day are the two most obvious. That last one, time of day, is particularly important when one considers just how many of those with suspended licenses are on the roads.
Collisions themselves have attributes. Was it car on car, car on bike, car on pedestrian, or bike on pedestrian? What were the ages the drivers, the victims, and the victim drivers? Was anybody impaired? Were any of the drivers suspended? What was the speed? What was the count of fatalities? Of serious injuries? Of injuries? What was the property damage? Was insurance fraud suspected? Was insurance fraud prosecuted? Was anybody found guilty of insurance fraud?
In this way, you can generate a score that is predictive of aggregate collisions, injury, serious injury, and fatalities. If you can predict the relative danger, evidence-based decisions can be made on where to focus limited resources. That is to say, in Canada at least, a fairly large population of people have to die along a very specific stretch of highway, or an intersection, before politicians are motivated to expend resources to improve the engineering of a specific tract.
Such solutions are focused on physical engineering only, not the social in context of a given road segment.
I’ll leave it up to the reader to decide is subject matter expertise is required to generate such a predictive scoring mechanism, or, if a machine learning algorithm would be capable of producing much more accurate predictions.
ESRI produces a tool called ArcGIS. It uses open source mapping data, and has a fairly sophisticated set of functions.
ArcGIS has an optional tool called Model Builder. GIS analysts use it to score environments. Or, in this specific case, road segments.
Yesterday, I enumerated a sequence of attributes that a road has, ranging from speed limit, neighborhood, size, intersection and crossings. There are also attributes like weather and time of day, which are hiding yet more spurious variables like number of suspended drivers on the road or number of drunk drivers on any given segment. These all may, possibly, explain why a segment is problematic. But what do we mean by problematic?
Modelling works best when there is a single dependent variable.
From a safety perspective, the term ‘hazard’ might be used – the gross amount of damage that a given road segment causes on society. How that concept, ‘hazard’, is built, has a big effect. Civil engineers who specialize in road construction have their own values for a human life. Policy analysts differ on the true cost of a human life, lost productivity due to serious injury, and how to account for health care costs of injury.
It is likely tempting to express a single value or a single count.
Fatalities, compared to injuries, compared to nothing happening, are anomalies. And that poses a very special challenge for the analyst. To a certain extent, resorting to machine learning algorithms may yield an advantage.
Again, I’ll leave it up to the reader to decide if subject matter expertise is really required to model out a hazard score or its predictive components.
Modelling can happen on the GIS layer
It might not actually be news to anybody at all, but modelling is indeed happening on the Geographic Layer, and it looks pretty awesome. I’m in no way saying that ESRI produces the only technology to do that, as I’m confident that logistic companies have already long hyper-optimized their transport systems, but that’s all pretty awesome.
You may have heard about Microsoft’s GPS Patent, the ghetto avoidance algorithm. It caused a bit of furor. Concretely:
- Microsoft filed a patent for a GPS guided walking app.
- It has an algorithm allowing the user to avoid a certain neighborhood if the crime threshold is too high.
- Some people say that it’s racist and prejudicial.
- Other people say that it has nothing to do with race or prejudice.
Like People Clump Alike
People who are alike tend to live in alike places. So, it’s no surprise that wealth, education, religion, ethnicity and professions tend to clump up together. And sometimes they do in ways that planners don’t foresee and forecast.
There’s an effect in municipal politics that nice areas get nice things while poor areas don’t. It’s why some neighborhoods get traffic calming while others get one-way roads stabbed through their hearts. Everybody has somewhere to go, they want to get there fast, but they don’t want that traffic going through their neighborhood.
Indeed, we’ve had a recent outbreak of tension between commuters and dwellers in Toronto:
“When I went door to door in Ward 16, I had a woman who almost jumped through the screen at me. She told me … ‘The one thing that the city has done that has changed my life is the bike lanes on Jarvis. I bought this house, I have four children, and I can’t get home to them for dinner.’ – Source.
Speed, Safety, Efficiency, and Political Interests
In many ways, certain cities may explicitly choose to tolerate a higher incidence of car on pedestrian fatalities and serious injury in depressed communities just so that others can get to their children sooner. Some people say that in some cities, like Berkeley, pervasive traffic calming has made the road network unusable.
In many ways, the derivation of road hazard rates may lay bear the reinforcing relationships between traffic, real estate prices, and political influence in a given area.
Discrimination by accident
If a machine learning algorithm is left alone, with no subject matter expertise guiding it, it is very possible for it to generate discriminatory policies against neighborhoods that are unsafe from a road safety perspective and that are coincidentally concentrated along some socio-economic line. The machine isn’t a bigot. It’s the complete opposite. It can’t see those factors.
If a subject matter expert knows of these reinforcing variables, and attempts to control for them, and then fails, it could be argued that they’re a bigot. And, good luck trying to explain something so nuanced in the newspapers. You just can’t do it.
This is a major problem in the science of policy analytics.
It’s an important one to acknowledge.
I’m excited for Edmonton and Edmontonians. The decision to hire a predictive analyst for road safety is an awesome one, and one that ought to generate real results well into the future.
There will be no real way for any individual Edmontonian to know if their life was saved as a result of the recommendations realized and applied as a result of this program. On the aggregate, over time however, fewer fatalities and serious injuries should accrue. I’d like to see a long term goal of 0 fatalities in Edmonton. Why should people die just because they made a silly mistake? Do we really believe that anybody making a genuine mistake is deserving of death?
The other way of thinking about it is that the odds of something truly horrible happening to each individual will go down just a little bit. Ultimately, health care costs will come down – and that’s something that nearly every Canadian is in favor of.
Analytics and GIS: Other Applications
Whereas I believe, rightly or wrongly from my experience in the Canadian Transportation Research Forum (CTRF), that most solutions in logistics and transportation economics are pretty much optimized (hollowed out), other areas are just now emerging.
- Uber is using analytics and GIS to bust up one of the last remaining FDR policies in transportation.
- Canada’s competition bureau ruling against CREA promises to open up more innovation in the scoring of neighborhoods and the relationship to real estate.
- Maybe our elected officials would set better policy if they felt comfortable engaging with the evidence?
So now that you’re aware of potential solution sets, is there something else you can think of doing with them?