Archiving the web: a record of history

 Anyone
visiting the Amazon website last week would have been struck by a
letter on the home page from Jeff Bezos informing UK shoppers that they
could now purchase Amazon’s e-reader, the Kindle, for shipping to the
UK.

Cue a flurry of commentary about the death of books and the demise of culture.

Inevitably
we are moving towards a world of digital content. Blu-Ray may well be
the last ever ‘physical’ entertainment media format. But as we move
away from information being stored on physical formats
(CDs/DVDs/paper/books) to data stored in the internet how, in the
future, are we going to look back and understand the state of knowledge
at a particular time in history?

The written word, stored on
parchment and paper and filling libraries and archives has always
provided historians with an evolving and largely permanent record of
human history. Digital content, in contrast, is more fluid and more
concerned with the state of information at the present time. New
content trumps old content. Older information on the web is largely
constrained to out of date blogs and websites rather than a systematic
approach to archiving.

To put this into context the journal Science has found that 13% of Internet references in scholarly articles were inactive after only 27 months.

However, since 1996 a non-profit organisation called the Internet Archive has
been archiving digital content with the goal of building an ‘internet
library’. It has archived over 150 billion web pages and hundreds of
thousands of moving images, live music files, audio and document texts. Its “Way Back Machine” is a useful tool that provides a snapshot of selected major sites over the years. Here is Apple.com back in in 1997.

The
Internet Archive has also partnered with eleven National Libraries
(Australia, Canada, Denmark, Finland, France, Iceland, Italy, Norway,
Sweden, The British Library & The US Library of Congress) to create
the The International Internet Preservation Consortium (IIPC). The
mission of the IIPC is to acquire, preserve and make accessible
knowledge and information from the Internet for future generations.

However,
no single project can ever hope to archive the entire web. The approach
to preserving digital content may not be of major concern today because
the web is simply too new. However as the internet becomes the de facto
reference point for all human knowledge: it will become of critical
importance.