Richard Demarco is one of the long-standing figures involved in the Edinburgh Festival, and over the years used his camera to document the event and the people involved. He had unique access to a wide variety of interesting people, and managed to build a body of work of tens of thousands of photographs.
In 2008, fifteen thousand of those images were meticulously scanned, added to a database with associated metadata and then put online.
Looking through the images ahead of Culture Hack it was clear that there is some beautiful stuff in this archive. And the folks in charge of the archive were releasing it for us culture hackers to play with during our 24-hour hack.
If you have a read of the launch press release for the project it was all about enabling the public to get unique access to the archive.
Trouble is, when I went to the Demarco Archive website I was disappointed because all I found was a Flash site with some text. No images!
It was only on my third visit to the site that I noticed a tiny little button (ad blindness maybe?) that when clicked on the home page does nothing, but if you navigate about a bit, and then click it, leads to a pretty complicated set of screens with lists of artists (most I’d not heard of) in tiny text, and little thumbnails of images and so on.
So the content was there, it was just locked away behind a Flash site. No URLs to individual items, no way of bookmarking or copying and pasting stuff. It felt like they’d forgotten about the main purpose of their project - let people see the images.
It was late in the hack, we had a few hours left, and I’d been saying that if there was enough time I’d like to take all of the images, import them into a database and then write a script that would load them into a Facebook page as a timeline stretching back to the sixties. I think two teams had the same idea here - I saw some conversation on twitter about it.
That proved impossible because of licencing, but I was sufficiently motivated by this (“It’s a man’s life’s work!”) that I thought I would see what could be done in the dying hours of the hack day.
I teamed up with Katy Beale, who I’ve worked with via Caper, and while she was going through the archive manually and seeing what she could find to pull out as a “story”, I started coding.
First, I had a look at what the site would give me. Using “Inspect element” in Chrome, and turning on “Network” I found that I was able to see the requests that the flash movie was making to the back-end server (is this new? I’m not sure it used to do that). I’m not kidding, the entire thing is based on reasonably-well-structure XML data. So the content was there, it’s just they’d stuffed it behind an inaccessible SWF.
So I thought I’d poke it a bit, and wrote a little script that would increment from 00001 to 15000 and fire a bunch of ‘wget’ calls at the server to see if I could scrape all that XML down in some way. Turns out that after 100 requests, Apache Tomcat kicks in and stops that.
Next I tried the JSON API that I discovered that Sync were providing, but that didn’t have any of the images in it, and would have required me downloading a multi-gig file, sorting and making sense of the files and uploading back to S3. So that was out because of time constraints.
But then, I noticed that the CSV files provided did have filename fields. Hacking a few URLs that I found via Chrome’s inspect element feature, I found that I could construct a URL for an image hosted on the main site using values from the CSV file. Bingo.
So we were go. I knocked up my usual Ruby / MongoDB / Twitter Bootstrap / Padrino / JQuery setup as a new app and set about creating a little app that pulled in the CSV file as mongo documents. Surprisingly quickly (after a little bit of UTF8 conversion using iconv) I had a little app that showed some images on a screen.
Meanwhile Katy had found loads of great stuff in the archive and had been cross-referencing with Wikipedia to make sense of it. She was busy writing text for the home page and artists pages, and I set about making some kind of navigable way for viewing the archive.
I wanted a lovely experience for the app - it had to be a step up on usability and accessibility, but in an hour, what can you do? Turns out that Photoswipe is really easy to set up (I’ve used it on an Accenture project), once you know Twitter Bootstrap you can get something that works pretty well on an iPad, and using Compass I could style the layout pretty easily cross-browser.
So we iterated, and deployed as we went, essentially making a change and then ‘git push heroku’ once it was set up.
It came together very quickly, and what was really crucial was the combination of someone playing ‘curator’ (Katy) and someone making the code (me) - I think there’s something in there that could work well for other hack days.
Sadly, when it came to demo time, for some reason the internet failed, and while there were people in the audience browsing the app on their laptops, our connection froze so we couldn’t show it.
But that just means I got to polish a little bit on Sunday after the event!
So afterwards I added a bunch of nice features:
What I hoped to achieve with this isn’t “Oh that’s clever”, or “Flash is rubbish, use HTML5”, but to show that even with very limited time and resources you can experiment rapidly with what can be achieved just from the raw data, as long as it’s in a good-enough format.
But also, and mainly, to show that with an archive project, if you focus on the assets and units at hand (in this case, photographs) through a little story-telling and using a few best-practice ideas (permalinks, responsive design and so on) you can build something that can achieve the aims of your project in a simple, effective way - in this case, opening up the archive of one man’s unique experiences.
So, what’s next? I’m not sure - I doubt there’s any immediate budget anywhere to take this app forward, so I’ve open sourced it and it shall remain on Heroku under my account unless someone from one of the organisations involved wants to take it over. There’s tons that someone could do here - tracking artists through the archive, linking to other services, gathering photographs together into collections, mixing in DBPedia, adding Facebook graph and schema.org metadata, but for now that’s “Featureland”.
A fun hack, and I think we were all surprised that it’s actually possible to do that kind of thing so quickly with the free, open source tools available and a bit of experience using them.