A right to archive?

In grave tones the British Library warns that Britain’s “online heritage” could be lost forever if the government does not grant it a “right to archive”, similar to that which governs print publications. Britain’s official über-library has since 2004 been archiving UK websites, but has so far succeeded in covering just 6,000 of an estimated eight million sites.

The British Library complains that it must obtain the permission of website owners before archiving them…

“We’ve got the know-how but we need the rules to say we don’t need to ask permission… We’re archiving for the nation rather than commercial gain.”

Set aside for one moment the emotiveness of the second statement, and focus instead on the factual inaccuracy of the first. Neither the British Library nor anyone else requires authorisation to archive historical snapshots of public websites. As is shown by the private but non-profit Internet Archive (aka WayBack Machine), it is up to site owners to opt out if they do not want their content archived. They do this by placing a simple instruction code in a file named robots.txt. This blocks the WayBack Machine’s software robot from trawling the site.

One has to ask why the British Library wishes to archive websites, and how often it intends to take snapshots of their content. This is not evident from the press release that accompanies the launch today of the library’s “UK Web Archive”. If the British Library seeks to fossilise websites that record “major cultural and social issues”, then its operation should be restricted in scale, with no talk of archiving eight million sites. The Library should instead concentrate on collaborating closely with the owners of websites of significant UK interest, and for these online publications expand on the work of the WayBack Machine.

If, on the other hand, the intention is to archive everything, the question is: why bother? Surely there is no point in collecting daily snapshots of, say, every FaceBook page and ranty blog. Unless it is simply an exercise in creating work for work’s sake, and providing circular justification for the digital side of the British Library’s otherwise worthy endeavours.

But there are deeper, more philosophical issues raised by initiatives such as the WayBack Machine and UK Web Archive.

Digital information is not a physically conserved quantity. That is, there is no universally fixed amount of digital information. Bits and bytes are created as required, and can be destroyed without violating any fundamental law of physics. Digital information is growing at an exponential rate, and much of it is redundant. Useless information.

We are rapidly evolving into a society that employs vast networks of sensors to record and monitor our environment. CCTV cameras watch our public comings and goings, stress sensors monitor the movements of buildings, bridges and vehicles, and gargantuan scientific projects such as the Large Hadron Collider collect billions of data bits every second.

Before long it will become impossible to archive for posterity all the world’s digital data. It may be less of a challenge to file away website content, but the principle remains the same. There is no point in collecting information unless it has demonstrated value.

The British Library’s mistake is to treat online digital data as being in essence the same as words on a printed page. They are not.