By Ryan McGreal
Published October 27, 2010
this blog entry has been updated
The City of Hamilton's election results page is human-readable (if inelegant) but not particularly machine-readable.
Because it's just stuff on a web page, the structure of the data is difficult to extract in such a way that it can be processed and analyzed. If you want the low-level poll-by-poll data, you need to open up 207 separate web pages and copy out the results. That's just crazy.
I asked Tony Fallis, the city's Manager of Elections, if he could provide the data in a more structured format. He explained that it is only available as HTML on the city's website or, alternately, in hard copy.
I know some people in the community have been copying numbers off the city's web site and doing manual calculations. (See, for example, several comments by "Fred Street" in this article.)
That's admirable, but it entails tedious, manual busywork and I'm a lazy programmer. To save time, I wrote a script to do the manual work of crawling all 207 web pages, extracting the line-by-line results out of them and formatting it all into a database table.
(Aside: the script is ugly and hacky even by my standards, and currently breaks a bit when it reaches Ward 14, in which the sole candidate was acclaimed. I intend to fix the bug and clean up the code as soon as I have a chance.)
Now that the data is in a database, we can start to do interesting things with it. As a first step, I've added the results to the RTH Election site. You can view a ward-by-ward summary, a more detailed poll-by-poll summary, a list of polls, a list of wards, or the raw data.
Even if you're not a programmer, the benefit of having tabular data in clean HTML tables is that you can copy it en masse, paste it into Microsoft Excel or OpenOffice Spreadsheet and start messing with it.
(Note: if you notice any errors or discrepancies in the data, please let us know so we can fix them.)
More important, the same data is also available in JSON format (a format for transferring structured data over a network) in the RTH Elections API. (You can jump straight to the Results section of the API documentation.)
Because you can access the data programmatically, anyone can easily grab the data from the RTH Election site and analyze it further or combine it with other data and services to produce more valuable insights into the results.
All the data on the RTH Election site is licenced under the Creative Commons Attribution-ShareAlike 3.0 Unported License. That means anyone can access, recombine, and redistribute the data as long as they a) attribute the source, and b) use an equivalent licence for any resulting works they distribute.
This is the direction in which public data is moving in progressive cities around the world.
It's the direction in which Hamilton needs to start moving if we are to encourage the kind of value-add citizen engagement with public data that cities like Ottawa, Vancouver and Edmonton are already enjoying.
Update: The City just changed the structure of their election results pages, and so the script I wrote to scrape the results has stopped working. I'll update this page when I get it working again.
You must be logged in to comment.