Earlier this week, some of us from OpenHalton joined the Open Hamilton crew at their first hackfest.
The Monday night meet-up was timed with the release of the municipal election candidate financial disclosures by the City of Hamilton. The objective: to lay the open data foundation for the 2010 Candidate financial statements, which outline how much candidates spent on their campaign, and where this money came from.
Municipal candidate financial statement auditor's report
For about five hours, we hacked away to transform the PDF scans of the 9503P "Financial Statement - Auditor's Report" form pulled from the city's website into a more usable dataset.
I had my reservations about an "evening hackathon" type event, since we had just a few hours to put together something useful, but we managed!
In my humble opinion, the following three considerations made it a success:
Attendees at the Open Hamilton hackfest
Open Hamilton team planned the hackfest at an easy-to-get-to location, right off the highway (and by Timmy's where some of us fueled up Smile). Not to mention it was a super cool place: Think|haus, home of Hamilton's "open data" and local hackerspace.
Our host Richard Degelder, whose beard rivals that of Richard Stallman, got us access to an open dedicated wifi network with plenty of bandwidth & all the necessary ports for FTP, SSH, Remote Desktop, and so on.
The whiteboard and projector helped organize our thoughts, a shared Google Doc helped aggregate links and coordinate our efforts. (In the past we've used Wikis like PBworks that work well, but it's nice to be able to collaboratively edit the doc. If you prefer the look/feel of Excel, you can use Office Live for free to edit simultaneously as well.)
Open Hamilton Election Financials - Total Contributions by Ward
It is tempting to aim really high and set lofty goals for a hackfest: "we want the most comprehensive dataset", or "we'll build the coolest app that does everything you can possibly imagine". Don't do it!
Set a modest goal to end up with a "basic dataset X" or a "basic functionality Y" in the app you're hacking. Then you'll be happy to wrap up your hackathon with something to show for it.
On Monday we decided to "scrape" only a small number of data fields, thirteen to be exact, pulled from the PDF forms, and to load the data into the OGDI catalogue that stores data for DataDOTgc.ca so that we can visualize it.
End-of-hackfest: we successfully loaded the finance datasets, exposed the data via APIs, and were even projecting some dynamic OGDI charts on the whiteboard. Mission accomplished!
Hamilton ward boundaries
We not only knew what we wanted to do, also but had a pretty good idea of how we would do it. Prior to the hackfest, there was a planning thread to discuss the project and the tools that are needed.
Going in, we knew we'd be extracting data into a simple format (CSV) and loading it into a data store. We also discussed the scope of the effort, and that we'd eventually want to mash-up some of the data with ward maps.
This meant we needed Hamilton Ward data in a format like KML, so Joey Coleman and Richard did some pre-work to get the Ward boundaries data prepared ahead of time.
We ended up with a clean version of the Hamilton Ward Boundaries KML, and because we planned for it ahead of time, we included information like the Ward ID in version 1 of the spending dataset that could easily be mashed up with the Ward KML Layer. So, a bit of pre-planning up-front saved us a bit of aggravation in the end.
To take PDF data into a spreadsheet or database and then a cloud catalogue doesn't seem like a big deal on its own.
But to orchestrate half a dozen or more people with various skills and technology experiences requires some balancing of short-term objectives of the hackathon with long-term goals of the local open data movement.
first published on Open Halton
By nerds (anonymous) | Posted March 31, 2011 at 10:37:14
*insult spam deleted by site administrator*
Comment edited by administrator Ryan on 2011-03-31 10:47:49
By crhayes (registered) - website | Posted March 31, 2011 at 11:16:13 in reply to Comment 61781
Ryan, I like this proactive approach you've taken to comment moderation! ;) I wish I could have seen what the comment said in order to trigger this response.
By Pxtl (registered) - website | Posted March 31, 2011 at 11:45:04
Dammit, I wish I had more time in the evenings to participate in stuff like this. I wanted to cross-reference contributors against LinkedIn profiles to get professional affiliations.
By mrgrande (registered) | Posted March 31, 2011 at 13:36:48 in reply to Comment 61783
Dang, that is a good idea. Version two, perhaps!
By JoeyColeman (registered) - website | Posted March 31, 2011 at 13:39:05 in reply to Comment 61783
Great idea. When are you available? We can try to book Think|haus at that time.
By Pxtl (registered) - website | Posted March 31, 2011 at 15:28:35 in reply to Comment 61789
When my kids are old enough to watch themselves. So pencil me in for August 2019.
By Robert D (anonymous) | Posted March 31, 2011 at 17:15:45
Don't know if you guys noticed this comment from another thread (just posted today), but maybe it will help make the job easier?
*****
By Momoko (anonymous)
Posted March 31, 2011 at 12:18:06
Raise the Hammer is great :) Just came across it. I live/work in Toronto, but grew up outside of Westdale.
To both Joey/Ryan — have you heard of data journalist Pete Warden releasing an extremely-easy-to-use data-scraping toolkit last week? check it out:
http://www.datasciencetoolkit.org/
Fun!
Keep up the good work!
Momoko
@buzzdata
By highwater (registered) | Posted March 31, 2011 at 17:49:41 in reply to Comment 61792
I am a computer illiterate, but maybe I can help the cause by offering to babysit for computer literates! :)
(I'm only half joking. I live in Westdale too, Pxtl. We might even know each other already!)
By JoeyColeman (registered) - website | Posted March 31, 2011 at 18:36:17 in reply to Comment 61794
The challenge with disclosures is that many are handwritten. Some have scratched out errors and other anomalies requiring Human eyes to compensate for.
Ryan's been doing a great job of getting data off the City website using similar tools.
By adrian (registered) | Posted March 31, 2011 at 18:55:26 in reply to Comment 61782
How can you both like the approach and wish to have seen the original comment? ;)
By Whoeee (anonymous) | Posted April 01, 2011 at 08:16:46
The revenge of the nerds displaying itself before our eyes. Go nuts boys.
By Pxtl (registered) - website | Posted April 01, 2011 at 11:42:35 in reply to Comment 61804
What, did a nerd boink your girlfriend... in a haunted house... wearing your costume on Hallowe'en so she thought was you and gave it a creepy not-really-consensual vibe but she had a good time so apparently that makes it okay...
Yeah, that scene was weird.
By Undustrial (registered) - website | Posted April 01, 2011 at 12:12:16
How are you folks dealing with the issue of numbered corporations? I had a lot of friends who've done this kind of work before, and at the time, that was the biggest barrier.
If there's another one of these Hackfests, I'd be really into lending a hand. Warmed my heart to see y'all in the paper the next morning.
By Myrcurial (registered) - website | Posted April 01, 2011 at 14:30:08 in reply to Comment 61792
FWIW, there are usually kids crawling around think|haus -- I don't know how old yours are, but the youngest full think|haus member was 5 when we started.
There's fun stuff to do, even for little fingers... at our open|haus, I was showing a 3 year old girl how to build her very own flashlight.
Lots of movies both geeky and kid are available to watch on the big tv in the lounge. And Joey brought in an xbox. That helps too :)
Edited to add: full disclosure - I'm the founder of think|haus.
Comment edited by Myrcurial on 2011-04-01 14:30:44
You must be logged in to comment.
There are no upcoming events right now.
Why not post one?