an amateur digital archivist’s thoughts

(This article was originally published in longdata.org)

I’ve always been an amateur archivist.

Ever since I was a kid, I’ve been clipping articles from newspapers and magazines, pasting them on paper and organizing them in folders. For some reason, I always thought I should preserve the pieces of information that seemed important to me.

This is one such example. An article from a Greek newspaper about “super computers” in US universities, dated Aug 18, 1985 -back then, I was 12.

Few years later, I also started collecting computer magazines. I think that I enjoyed reading BYTE magazine or WIRED as much as I enjoyed archiving them. I keep doing it, not only I subscribe to print magazines, but also keep most of them.

Back then, I didn’t know how or why or even if this would be important. It turned out to be a valuable resource when I wanted to putting things in perspective, but also great brainstorming material. It’s amazing how many good ideas and concepts are forgotten, just because they failed at a certain point of time. It’s also interesting to notice patterns of concepts, ideas and trends that happen again and again over time.

Then, some publications (it must have been mid- or late- 90s) jumped on the CD revolution and created CDs that contained their full archive in digital format. The promise of having a huge archive that would normally take a good part of your library in a little disk was tempting. I soon realised that this form was not the best way to maintain a long-term archive: as I moved to non-mainstream operating systems (I used SCO Unix, and then Linux), it was clear that most of those CDs (using proprietary formats that required an MS Windows program to view) had little value as archives.

So I kept the print versions. Paper seemed a safer bet for long-term access -and I was right.

The Internet changed much. As the original format of the information I consumed was digital, archiving had to be digital too -initially, I tried printing some of them, but printing didn’t fit a natively digital work flow.

I’ve used most of the available tools to do my clipping and archiving: from self-hosted blog engines, to del.icio.us (when it was called like this), to digg.com, google reader shared items, posterous, desktop apps... You name it, I’ve probably used it. Most of the archiving I did using these tools, is lost or inaccessible, or will be lost in a couple of years.

So, I’ve come to realise that when it comes to personal digital archiving:

archiving should be done using the simplest possible technology.
archiving should not be dependant on your commitment to support it (financially by paying a service every year, technically by maintaining a server, etc.)
archiving should not be dependant on third parties, especially if they don’t make any commitment that they will keep providing the services you need for a long time.

This is why I’ve ended up setting a bucket3 link blog for my digital clipping needs at picks.vrypan.net.. Bckt generates static HTML files that need nothing but a host to host them. I’ve ended up hosting them on github:pages that come with an extra bonus: anyone can check out the whole archive from github.com/vrypan/picks.vrypan.net [*].

The thing is, if you value your personal archive, try finding an archiving solution that generates simple, future-proof formats and its survival doesn't depend on your support or third-party services.

—

[*] It’s the gh-pages branch. I’ll probably write a separate post describing the technical details of the work flow.