When one unfortunate company, which shall remain nameless, invokediously. Danny Bradbury explains why the cross-your-fingers approach to contingency planning could be your downfall. its disaster recovery policy with a well-known service provider, it thought that it would be protected. The IT manager happily bundled his backup tapes - which were held in a safe for added security - into a car and drove them down to a backup site.
Unfortunately, it was only when he arrived that he realised he had forgotten the key to the safe, which was now happily melting in the fire that had engulfed his computer room. Luckily, this was only a test run, and the fire at the operational site was only an imaginary one. Key, tapes, and the IT manager's sanity were recovered safely. Nevertheless, the anecdote underlines the importance of constructing a comprehensive disaster recovery strategy.
According to Paul Barry-Walsh, chairman of business continuity consultancy NetStor, there is a big difference between disaster recovery and business continuity. The former involves the recovery of the IT function and data alone, whereas business continuity ensures that the rest of the business can keep running as well.
When creating a business continuity or disaster recovery plan, organisations must take into account the nature of the recovery facility, explains Jim Taylor, operations manager at business continuity consultancy Adam Associates.
He explains that there are three main types of recovery method.
A "hot" site is often thought of as any backup centre, but in fact it is an alternative facility with a duplicate database and equipment. This enables a company invoking the service to reposition its staff quickly and get them up and running as soon as possible.
A "warm" site is a backup site containing hardware equipment but which requires companies to restore their applications and data before they can resume their business. While this may be acceptable for some smaller firms, large companies may find it unworkable.
A "cold" site is an alternative facility that doesn't have the equipment in it at all. A good example of a cold site would be a simple Portakabin.
How much should a company spend on the IT part of its business continuity solution? According to Barry-Walsh, organisations should spend 5% of their IT budgets, although some spend as little as 1% or 2%.
Many companies will restructure their IT operations without paying enough thought to the recoverability of the new infrastructure, says Brian Fowler, European business recovery services manager at Hewlett-Packard. He explains that you will still often see companies consolidating their data centres from four or five down to one, which can increase the problems associated with disaster recovery because it increases the risk. If your data centres are distributed, it is easier to remotely backup your data without increasing the cost too much. Centralising your servers into one place means that backing up data between them is not enough. If your site is damaged or destroyed, you need another remote site to continue processing your data.
Such is the danger of putting all your eggs into one digital basket.
According to Barry-Walsh, there are a number of critical steps that must be taken to ensure that your business continuity strategy is robust.
Firstly, you must assess the critical elements that are needed to keep core business processes running. The key components here are facilities, personnel and technology, or most likely a combination of all three. The assessment process is typically broken down to the departmental level.
At this point, the person in charge of the business continuity project must examine the impact to the business if any of these critical elements were to become unavailable. The cost to the company in financial terms should be ascertained to provide a foundation on which to build a contingency plan.
Finally, the company will be able to use this contingency plan to decide whether to buy a selection of backup systems, or simply redesign its existing IT infrastructure. In some extreme cases it may be impossible to create a backup system. A particular machine may be too expensive or unique, for example. In this case, it may be worth taking out an insurance policy.
Hewlett-Packard's Fowler agrees with this model but isolates risk analysis as a separate step, in which a company will assess the likelihood of a particular element failing. He also highlights the importance of a testing plan, enabling a company to run a recovery drill on an annual basis. He estimates that only 50% or 60% of companies test comprehensively.
Worth the risk?
The risks associated with not testing are huge. Companies that have backed up their data but never restored it can never be sure that it can really be recovered.
Fowler adds that it is vital to protect your business continuity plan from changing business operations and organisational structures. Regular testing is necessary to ensure that alterations to working practices don't adversely affect the success of your contingency plan.
Written on paper, the critical steps in a business continuity project look easy enough. Nevertheless, they have hidden difficulties. In particular, finding the cost of downtime is not as easy as it sounds. The implications of a particular element failing may not be immediately apparent, and indirect consequences may be hard to pin down. You may have an idea of the number of calls that the customer services department takes every hour, but it may be difficult to estimate the percentage of calls that represent new business.
One of the biggest problems for disaster recovery and business continuity project managers in the past few years has been the proliferation of PCs across complex client/server networks. It is difficult to manage those PCs, and getting end users to save their files to a centrally managed site can be like nailing jelly to the ceiling.
It is vital that an organisation conducts a health check on its PC data, Barry-Walsh points out, adding that it would be better to employ someone to conduct that audit on your behalf if you are a large company. He argues that within six months of undertaking an in-house project, many customers still haven't managed to audit all their PCs, whereas the audit process can be conducted in as little as three weeks by an expert.
According to Nigel Ghent, marketing manager at disaster recovery and business continuity company Comdisco, assessing distributed data involves lots of footwork. "We go out there with a clipboard and a questionnaire and we get into the business. Go and talk to the accounts department, go and talk to the sales department, go and talk to the chief executive," he says. "Get underneath the skin of the organisation, and understand what it is that they do as a business."
In this way, an auditor is able to understand which elements of distributed data will be most important. It is significant that such auditing is largely paper-based, using face-to-face interviews. This emphasises the importance of marrying a business understanding with a quantitative evaluation of the data on the network. According to Adam Associates' Taylor, companies can make the recovery of critical PC data and applications much easier if they standardise on their PC infrastructure at an early stage. If every PC holds the same system image, or it is at least standardised at a departmental level, then recovery will be much simpler.
These days, it is impossible to think in terms of computing disasters without paying some attention to the millennium bug. This time next year, some IT directors will be trying to come to terms with the catastrophic failure of their systems as a result of the date changeover, and kicking themselves for not maintaining their systems thoroughly enough. What are disaster recovery and business continuity companies doing to prepare for invocations caused by the problem?
Comdisco's Ghent explains that the company has installed dedicated servers for millennium testing, and adds that the company is dramatically increasing its head count. The extra employees represent a nod in the direction of the year 2000 problem, but Ghent is at pains to point out that it is largely a simple corporate expansion that would have happened anyway.
Generally, however, if you are an organisation relying on disaster recovery services to bail you out of a software fault in your system, the news is not good. Ghent explains that most contracts in the business continuity world define disasters as unplanned events. Business continuity companies' most likely approach to the problem will be to refuse an invocation from a company that suffers downtime due to the inadequate preparation of software.
It is also worth pointing out that, as a software issue, the year 2000 bug will represent as big a problem on a backup site as it would do on an operational site.
Nevertheless, Ghent admits that some companies may have a valid reason for invoking disaster recovery services during the critical change over period.
"If they were to suffer a power outage then it could be that the electricity supplier had not done its own remediation work, and had suffered a problem due to the millenium bug," he says. "In that case, as far as our client is concerned, they would look on that as an unplanned event and we would accept that as a valid declaration of their contract."
Adam Associates' Taylor agrees that invoking due to an inherent software fault won't do a customer much good, but adds that it may be possible for companies to use extra computing resources to catch up on operational work once they have solved the problem themselves.
Even if companies solve their millennium bug issues, they will still be faced with other problems. Press reports indicate that other bugs could create difficulties over the next 18 months. An anomaly in the leap year system means that many applications don't think there is a 29 February next year. This year, 9 September could cause problems in programs that use "9999" as an end-of-file marker because of the similarity of the date (9/9/99).
The euro will also present a problem for many companies who will use the transition period for the single European currency to update their systems and reprogram exchange rates.
Perhaps the most significant group of companies to be affected by the millennium bug and other software glitches will be the Small to Medium Enterprise (SME) sector. To make matters worse, these companies have traditionally been the most reluctant to pay for comprehensive business continuity services.
Peter Barnes, general manager for independent disaster recovery user group Survive International, explains that there is heightened awareness of disaster recovery and business continuity services among the SME community, but it is more of a trickle than a ground swell. He contends that it is being driven mainly by the larger trading partners, who are themselves realising that the main dangers to business continuity lie in the supply chain. Year 2000 contingency planning is a particularly significant driving force among larger companies, who are encouraging smaller trading partners to get their houses in order.
Adapting to change
As methods of doing business change, IT will become increasingly central to many companies' operations. The gradual growth of Ecommerce and Internet based customer services, for example, places a greater emphasis on the need for constant uptime. Many end users are increasingly viewing IT as a utility, meaning that downtime is becoming less tolerable. According to Hewlett-Packard's Fowler, the pressure on companies to provide proper contingency planning will increase as more firms begin to use integrated packages that govern more parts of the business. An ideal example of this, he says, are enterprise resource planning systems such as SAP, which can be used to automate large sections of the business.
This in turn will drive the vendors of disaster recovery and business continuity services to reduce response times and offer companies an increased level of service. As competition increases, especially in areas such as financial services, the cost of downtime will also rise. This makes it imperative that companies get back online as soon as possible in the event of a system failure.
Against this backdrop, IT managers are coming under greater pressure to keep things running smoothly. It is important that they introduce a business continuity strategy that does more than pay mere lip service to contingency planning. If this is to be done properly, a deep understanding of the business is required.
StoragePrevention is better than cure, and finding technologies that can help to avoid a disaster recovery invocation can help make a contingency plan more efficient and less costly. Efficient backup storage is one way in which IT departments can soften the impact of a server failure.
According to Simon Roe, product marketing manager at EMC, mirroring storage can minimise the problems associated with getting applications up and running again. There is typically a long restoration process when recovering from tape, but if another server is running data that is duplicated in real-time, things will be much easier.
Clustering is an ideal method for mirroring servers. Roe explains that EMC works closely with HP to support the latter's MC Serviceguard clustering software, and also collaborates with Microsoft on its Wolfpack failover clustering technology. EMC comes into its own in remote clustering scenarios, when servers are unlikely to share the same storage devices.
Another preventative measure which could prevent you from invoking a disaster recovery contract is the humble uninterruptible power supply (UPS). Paul Tyrer, UK sales and operations manager for UPS vendor American Power Conversion (APC) explains that far from being simply a battery in a box, most UPS products these days include intelligence that enables them to gracefully decay the system should the interim power supply fail.
A UPS can be used to shut down the server and page the administrator, while automatically rebooting a server when the power comes back on.
Microsoft seizes control of phishing sites linked with Russian state hackers
Fitness trackers over-estimate the number of steps their users take, analysis of 67 research reports suggests
Everything we think we know about the imminent Apple iPhone 9, iPhone 11 and iPhone 11 Plus launches
All the latest rumours about Apple iPhone Displays, CPUs, launch dates and even prices
Nvidia brings Turing microarchitecture into the high-end gaming segment