The perils of digital preservation

Recordable media, and the technologies that access them, become obsolete so quickly that companies must act now to keep vital data safe.

Piers Ford

Stories like the rescue of the BBC's Domesday project from the jaws of obsolete technology inspire headlines.

But digital preservation is no longer a matter just for heritage institutions, libraries and sectors with a culture of information audit trails and records management.

Advertisement

In the wake of the Enron scandal, the forces of litigation and regulation are gathering for a major assault on commercial information across the board.

Businesses, in turn, are taking their cue from the public sector, where records management is a matter of basic compliance. They are casting a fresh legal eye over their information preservation strategies, and are more aware that the information itself is a commercial asset ripe for exploitation.

Even digital can be destructible
Just because something was 'born digital', it isn't indestructible. Obsolescence affects not only the originating technology but the ability to access and release the data itself. And all recordable media have a finite life, whether it's 15 or 500 years.

'Born digital' information also has no physical original to refer back to. And even when digital copying is deemed the most effective way to preserve the original information - whether it's a priceless manuscript, a patient's X-ray images or a small business's tax return - the copy might be irretrievable or lost by the time someone requires access or an unhappy accident befalls the original.

Back in 1996, the US Taskforce on Digital Archiving alerted the world to the potential problem suggesting that, even then, an inconceivable volume of information that was digitally generated by the internet boom was already evaporating.

We could call it 'Dads Army Syndrome'. Back in the golden years of television, broadcasters routinely wiped thousands of hours of entertainment. Early videotape was an expensive commodity, storage costs were prohibitive and there simply wasn't a culture of preservation.

Today, the realisation of the treasure in the archives has long since dawned. There are regular appeals for viewers to ransack their attics for tapes of lost classics.

And the BBC is in the vanguard of digital preservation, not least with the rescue of its Domesday Project, a prime example of a 'born digital' resource threatened by obsolescence.

A recent report from BT Broadcast Services (BTBS) and Datamonitor, Digital Content Management and the True Cost of Staying Analogue, suggests that media companies alone have amassed almost 7,000 years worth of archived content.

This could potentially cost them billions of pounds a year in unrealised revenues simply because of the material's unavailability.

BTBS's forthcoming mediaREEL digital content management system aims to help archive owners digitise and store their resources, making them available online and opening up their revenue potential.

According to David Jamieson, head of content services at BTBS, this is the business imperative which broadcasters and other information-rich sectors have lacked until now.

"Digital files take up storage, which costs money, so there is no commercial drive to do anything with it unless you can find a commercial return," he explained.

"The value in archives is like gold in seawater: there are millions of dollars in there but nobody knows how to get them out cost effectively.

"Digitisation will unlock much of this potential by enabling businesses to catalogue their archives and distribute content more easily. But, for now, it's an area few dare move into."

The high cost of preservation
And when they do, they might be opening a can of worms. Jamieson pointed out that people rarely appreciate the hidden costs of preservation. These include access and technology infrastructure and evolution.

When they are revealed, the archivist has the unpopular task of presenting to the board an expensive problem which it never knew it had. Even the Digital Preservation Coalition (DPC) admits that costs are a problem.

But David Ryan, head of archive services at the Public Records Office, and a board member of the DPC, insisted that there is a general improvement in commercial attitudes to preservation.

"Information is now so important to so many organisations that good curation is essential," he said.

"We are confident that companies are beginning to recognise this, and are doing a better job, and that there are now storage products which allow for a sophisticated assignment of data."

Curation is a never-ending process. Choosing a proven, standards-based storage system on which to preserve archives is an essential element of a digital preservation strategy.

But before that, the method of preservation has to be chosen; and unless this is done at the point of creation, and followed up with a consistent management policy, time and the relentless evolution of technology will soon conspire to escalate preservation costs.

The doomed Domesday project
"It's vital to conduct the preservation work before it's too late, but when is that?" asked Paul Wheatley, project manager of Camileon, which has the task of retrieving the BBC's Domesday Book of the mid-1980s.

Originally conceived to celebrate the 900th anniversary of the original Domesday book, the project was an information-rich snapshot of British life, combining input from schools, photographers, journalists, researchers and academics.

Compiled on two interactive video disks accessed by the BBC's microcomputer system, it seemed set for eternal survival.

But within two decades the hardware and operating system were obsolete, and the video disks were deteriorating.

"The Domesday Project was released in 1986 and it was really a multimedia project five years ahead of its time," said Wheatley.

"We're doing preservation work 15 years later, which is too late. Ideally, it should have been done within five years, before the documentation started to disappear and the creators moved on.

"There isn't exactly a drop-off point at which digital media become obsolete; it's a gradual slide during which preservation resources gradually become more expensive."

Those resources aren't solely linked to technology. The Domesday Project magnifies all the strategic issues faced by any organisation coming late to the digital preservation table: tracking down documentation, pinning down intellectual property rights, research and bug fixing.

Most of these could be avoided by a preservation management strategy initiated at the point of creation. But that strategy should also include the preservation of the operating system and associated software packages.

As Wheatley pointed out, our overwhelming dependence on Microsoft Word doesn't mean that it will exist in its current form forever. It would already be difficult to retrieve documents created in early versions using a modern PC.

Camileon, which is based at the Universities of Leeds and Michigan, and is funded by the Joint Information Systems Committee and, in the US, by the National Science Foundation, has rescued the Domesday Project using a combination of two methods.

The first is migration, which is widely favoured as the corporate strategy against obsolescence, although it is costly in terms of regular upgrades, and there is potential for errors to be generated across numerous migrations.

The second strategy is emulation, which involves the recreation of the original operating system and hardware on which the data was created so that it will work on modern machines.

This hybrid is called migration on request. It means that the data is kept in its original format, while a tool is maintained which will allow that data to be migrated to a new platform at any point in the future.

"This means there is only ever one migration step," explained Wheatley. "You are greatly reducing the chance of losing data or changing the data you want to preserve, and it's a lot cheaper."

Don't take their word for it
Wheatley warned against relying on media manufacturers' claims for longevity. "Yes, they come up with special long-life media, but that misses the point," he said. "In 50 years, those media won't be the media of the time.

"Even if the BBC had waited for compact disc for the Domesday Project, we would still have the same access problem."

Russell Stalters, president at records management system specialist TrueArc, which was recently acquired by Documentum, agreed.

"Digital preservation is a two-pronged problem," he explained. "Once you've started managing and preserving information, you need a process which covers regular updates and retention periods, according to legislative and industry specific compliance requirements.

"Some technologies, like microfilm, claim to have 500 years of useful life. So what do you do in 450 years time? Someone will have to deal with it."

In other words, it's time for information-centric businesses to take the lead from digital preservation success stories like the Domesday Project and eradicate Dads Army Syndrome. There could be money in it.

Equally, putting it off could result in the need for retrieval rather than just preservation. And that will almost certainly cost more.

ROAD TO ACCESS

Digital preservation requires the active management of the information's life cycle from the point of creation onwards, taking into account appropriate compliance issues (the Data Protection Act, for example, and copyright and intellectual property legislation).

Digital preservation is increasingly driven by the need to give wider access to information, and often by the commercial imperative to turn that access into a revenue stream.

Preservation strategies must include a comprehensive audit trail and will almost certainly bring the IT and archivist's functions closer together.

Strategies should include the establishment of selection criteria, and investment in standards-based technologies compliant, for example, with the Public Records Office standards for records management.

Responsibility for refreshment cycles and information handling policies should be established. Without them, the risk of amassing an archive of badly maintained material, or material based on inaccessible media, increases as does the risk of high retrieval costs.

  • Have your say
  • Send to a friend
  • Print
  • Digg
  • Reddit
  • Share

Tags:

Do you agree?

Further reading

Related whitepapers

Related jobs

Most watched

Social networking

Summit: How businesses should manage their brands online

In part one of V3.co.uk's interview with Dirk Singer, he dicusses social media monitoring strategies

RIM discusses new developer tools

Blackberry exec on the latest offerings for programmers

Analysis and Reports

Remote access - Three steps to getting connected

3.4 million UK professionals now work from home – is your company equipped?

Cost benefits of a global collaboration network

This white paper is a must read for organisations looking for evidence of the bottom-line benefits of high-definition video and voice communications

Poll

Impact of Information Overload poll

Impact of Information Overload poll

What is the biggest problem your firm faces as a result of the data explosion?

View poll results

Advertisement

White paper library

Keep up to date with the latest products, services and technologies from the world's leading IT companies; IThound.com brings you over 6,000 white papers, case studies and analyst reports.

Advertisement

Newsletter signup

Sign up for our range of FREE newsletters:

Existing User

Newsletter user login:

Enter email address to edit your newsletter preferences

Job of the week

Search thousands of IT jobs :

Search thousands of IT jobs:

Advanced search

Hiring now on ComputingCareers:

Related IT jobs

Search thousands of IT jobs :

Search thousands of IT jobs:

Advanced search

Advertisement

Spotlight

Alcatel-Lucent logo

Summit: Networks swamped by information overload

Alcatel-Lucent's Neal Tilley talks about how enterprises and carriers can...

EU flag

Breach notification laws get green light

Privacy rights strengthened in Europe

Richard Thomas

Summit: Richard Thomas advises on handling the data deluge

Former Information Commissioner speaks out on government databases and data...

oracle sun

War of words escalates between EU and Oracle

Commission comes out fighting after criticism from Oracle and Washington

Primary Navigation