By clicking a retailer link you consent to third-party cookies that track your onward journey. This enables W? to receive an affiliate commission if you make a purchase, which supports our mission to be the UK's consumer champion.

02 Oct 2020

The future of our digital past: how do we stop our precious files from being lost forever?

With a wealth of precious data lying dormant in outdated formats, we explore the challenges faced by The National Archives

James O'Malley

If you owned a computer in the 1990s, the chances are that somewhere in your attic there's a dusty box of floppy disks that you haven't touched for years. But contained on the disks are files and documents that you might want to keep safe - maybe some early digital photos, or that novel you started working on about the Millennium bug.

Even if the files are safe, there is a problem: how can you get to them? Your computer probably no longer has a floppy disk drive, and even if you could insert the physical disk, are you still able to open a 1998 document created in IBM Lotus SmartSuite?

If you're not careful, you could find yourself in the strange situation where it's easier to view the photographs taken by your grandparents and stored in a physical photo album on the shelf than it is to view photos you took back in 2002.

To find out how to archive important files digitally, Which? Computing spoke to John Sheridan, digital director at The National Archives.

Which? Tech Support - expert technical support and buying advice

Dealing with data at The National Archives

At the moment, The National Archives uses two techniques for archiving files.

The first is a straightforward 'bitwise' preservation. This is where the file is kept exactly as it is, so that when it's stored in an archive, it's just a straightforward copy of the 1s and 0s that make up the file. This is the equivalent to an Egyptologist simply writing down the hieroglyphic symbols they can see on a stone tablet - 'bird, bowl, jagged lines' - without worrying about what they actually mean. The symbols can always be translated later - what's important is grabbing an exact copy for preservation.

The other method is to attempt to translate those hieroglyphics into something actually useful in the modern day, such as by converting them into English. This can be relatively straightforward for text files, but converting other formats can prove more challenging.

Sheridan says: 'A video may be fine at the moment and you can use common software to replay it. But we may also view video as being more risky than a common office document format in terms of the future.'

This is because video files are complicated. They are not just a series of still images stored in a row: they can play back at different rates, have different resolutions and aspect ratios, and use different algorithms to compress similar-looking frames to reduce the file size.

This is a challenge The National Archives is well aware of. Originally, when The National Archives began digital archiving 20 years ago, it set strict rules on the types of file formats that could be archived, limiting it to only a handful of different formats. But today, The National Archives now works with other 'memory' organisations around the world to maintain a registry of different file formats.

This means that in the distant future, it will be possible for the technically minded to look up the file format of a given video and use the registry to figure out how to decode it on to whatever computers exist in the future, however obscure the file type.

Keeping up as technology evolves

But perhaps the most vexing problem archivists face is a much simpler one: passwords. When government documents are created, they are often, understandably, password protected. But this means The National Archives could find itself in the situation where it has the Excel .xls file and the software to read it, but then discover that no one thought to write down the password.

Some files might be 'digitally signed', which means they are protected and require an internet connection to check that the user has permission to open them. This means that even if you have the file and the software to open it, you still need the server at the other end to confirm it, a server that might not exist in 10, 20 or 100 years.

In terms of actual physical storage, currently The National Archives uses a tape-based library system, for reasons of costs. But this, in Sheridan's view, is almost incidental to the act of preservation.

'Nothing will persist. It's all temporary', he laughs. In other words, although the digital archives are stored on tape today, at some point in the future, they will inevitably be transferred to some other system.

'Whether you're storing things yourself or whether you're relying on someone else's storage, you're still left with [the question of] how do I get my records from here to the next thing? And how do I know what I have? How do I know what I have will be the same as what it was, and in the future what it is going to be?

'The most persistent thing for us is the institution. We're 180 years old. So the reason I believe the records that The National Archives hold will be available in 100 years is because the institution will look after them, not because of any technology that we might use.'

Archiving the web

In hundreds of years' time, there's a chance that historians will look back on the early days of the internet - and we're still in the early days in historical terms - as something like the dark ages. The problem isn't that we aren't producing great works or important culture, it's that recording some parts of digital life in a way that a digital archive can understand presents a challenge.

Individually, photos, videos and text documents are straightforward to archive. All of their digital information exists in one place, so as long as file formats can be read, we can be confident that in 100 years, someone will be able to download and read the memoir you wrote in Microsoft Word.

But what about dynamic content such as webpages? Take a Twitter page, for instance. Each tweet is an interactive object that is linked to other tweets, which might in turn disappear. Some tweets - a joke about the news, for instance - might make sense in context on the day it was posted, but what cues could historians use in the future to understand it?

Unfortunately, there isn't a fully satisfying way to capture this sort of interactivity yet. But archivists do have some tools at their disposal:

Web crawlers

This is essentially a program that will go through a website and save each page as it loads, and will then virtually click every link on that page and download those pages, and then click every link on those pagesu2026 and so on.

This creates a 'snapshot' of the page - so not the real thing, but it does mean anyone in the future can use the Internet Archive to read what the BBC website wrote in June 1997 about the early days of Tony Blair's tenure as Prime Minister.

WebRecorder

This captures snapshots of websites in a similar way, but is a manually controlled process with a human archivist clicking around the web. This could help archive the 'experience' of, say, clicking through a series of steps on a form or searching through an interactive map.

Top tips on archiving your own files

So if The National Archives can do it, can the rest of us too? Here are John Sheridan's top tips for keeping your own memories safe.

Apps and hardware to protect treasured memories

If you only have one copy of your media on the hard disk of your laptop, you're just one hardware failure or burglary away from losing everything.

So what can you do? Here's a rundown of some of your best options.

Cloud storage services

If your files are safely stored on the servers of a big tech company, it means that even if the worst happens to your own hardware, you'll always be able to get them back.

All of the big tech firms operate their own cloud storage with different monthly data plans and features, but many do the same job. For example, for £7.99 a month you can get yourself 2TB of storage on Google Drive and Google Photos - enough space to store tens of thousands of photographs.

Google is particularly smart, too: you can you upload your photos and Google's artificial intelligence will analyse each image and enable you to use keyword searches to find objects in images. Search 'dog', for example, and find every photo of your beloved pooch.

To find the best space-saving tool for you, check in with our expert guide on how to choose the best cloud storage service

NAS (Network-attached storage)

Network-attached storage is essentially a hard disk that plugs into one of the wired ethernet ports on your router. A NAS storage device appears in file explorer like another computer on your network, making accessing files easy, whatever device you're using.

The UK's largest computing and technology title, published six times a year.
Easy, jargon-free advice so you can make the most of your tablet, laptop or smartphone.
One-to-one support from our friendly Tech Support team, ready to respond to unlimited member queries.