I grabbed this data from the Toronto Open data site. I loaded it into Google Refine. I used SPSS to understand just what was going on. I’ve stripped this post of political editorial, so if you’re here for that, this post will dissapoint.

The story:

Always read the data dictionary and description. In this instance, I have a file containing a sample of a sample of all the service calls to 311. Toronto has a single call center routing system called 311. It’s pretty efficient, in that it’s a single department, and that any citizen can dial and report something, and get routed through to the right place. It’s an example of very good policy learning.

The disclaimer is that only 25% of the calls to 311 are service in orientation. The data only represents about 25% of those service calls, and, it’s not comprehensive. It’s a pseudo-random sampling though, but the hand of manipulation appears to be quite heavy in this one. And it’s for one month, between October 7 to November 7, 2010.

My experience with Google Refine was positive. I have a list of all the postal code prefixes in Toronto, so I was able to distinguish neighborhoods. The effort was much less error prone than what I’m accustomed to with SPSS. I was able to augment and clean at the same time. It’s a quality utility, and I thank the product team behind it. Thank you.

I loaded it into SPSS to test a few hypotheses.

Toronto’s residents like to call. A lot. The sample contains 21,000 calls.

They like to complain about garbage. A lot.

And that’s pretty much what I will say.

The whole experience was fun, and I recommend others to do the same.