Until a few years ago, clustering technology was only available in data centres on mainframe-class machines and high-end Unix servers.
But in recent times, the cost of clustering has come down to such an extent that it is now available on commodity PC server hardware. To reduce costs even further, Microsoft is working on technology - codenamed Wolfpack - in conjunction with several PC server manufacturers in order to build clustering technology into the Windows NT Server operating system.
But why have clustering at all? There are two main benefits. First, it offers a level of resilience against system failure. With two or more servers, if one goes down either through a failure in the system or a hardware failure, users can continue working on the second server. Ideally, users should not be disrupted by the server crash and be unaware that they have been connected to a different server.
The second benefit is that clustering provides a mechanism by which two or more servers can be grouped together in order to harness their collective computing power when running extremely demanding applications. This is parallel processing where each server can work simultaneously on a complex computational problem, thus reducing the time to obtain a result.
With Wolfpack, Microsoft plans to deliver both of these benefits to the world of enterprise computing. However, unlike mainframe-class and Unix systems, these benefits will be available on commodity PC server technology, rather than proprietary hardware.
At its conception the PC was never designed to run enterprise systems.
It was simply a standalone computer for personal productivity. When Novell's NetWare began shipping in 1986, the PC moved away from these humble beginnings, with the emergence of a new class of PC, namely the server, and took its first tentative steps towards enterprise computing.
But with the widespread success of PC servers came a whole host of new problems, particularly those that arose when the server crashed. When a standalone PC crashed it only would affect one user. But when a server crashes, everyone using the machine is affected. The more users that are connected to the server, the more disruptive failure is.
To cut down on this disruption PC manufacturers have built their server machines in a way to make them more resilient to failure (see box opposite page). Among the cutting edge technology available today are the following: dual power supplies, so that if one fails the other switches in; hot-pluggable devices, so that faulty devices can be replaced without having to power-down the server; and disk arrays, which can maintain the integrity of stored data even when one or more disks crash.
While this technology will help keep a single server running, the best way to guarantee the least disruption to users in the event of a catastrophe is to have a second server, mirroring the activities of the first. That way, if the first server falls over, the second one can kick in automatically.
Before Microsoft began pushing the Wolfpack initiative, some PC manufacturers offered proprietary system software and hardware which gave their server machines a level of clustering and server mirroring. However each manufacturer had its own clustering system.
Wolfpack aims to provide a single, industry standard way to cluster PC servers. The initiative was first announced in October 1995. Among the companies which have signed up as early adopters of the new technology are Compaq, Digital, Hewlett-Packard, IBM, Intel, NCR and Tandem.
Digital is a founding member of the initiative and currently offers its own clustering technology, Digital Clusters for Windows NT Server. Rhys Austin, Intel server product manager at Digital, says Microsoft had licensed the Digital clustering technology for use in Wolfpack. "There are many concepts which are similar between the two," Rhys says. Nevertheless, Digital will offer Wolfpack when it becomes available. Speaking of what clustering adds to Digital's Prioris PC server family Rhys says: "Clustering technology adds high-level availability."
Austin believes that along with Intel-based hardware, Wolfpack will be available on the Digital Alpha architecture. However he feels that it will not be possible to have a mixed server cluster with both Alpha and Intel-based servers. Instead, Rhys sees servers being deployed in an asynchronous environment where the hardware specification of each server will differ, although they will all be based on the same microprocessor architecture.
Microsoft has said that Wolfpack will be released in two phases. The first addresses the issue of system resilience on PC servers; the second tackles performance. Phase 1 is due out this summer and is designed to improve availability and manageability by providing support for two-node clusters. Phase 2, available later in 1998, is expected to extend cluster support so that more than two servers can be connected together.
Along with the early adopters for server hardware, a number of companies including Computer Associates (CA), Oracle and SAP will be adapting their products to support the Wolfpack initiative.
Following a deal with Tandem, CA is currently developing clustering on Unix. However, for Windows NT, Wolfpack will be used. CA's marketing director Jay Huff says: "We are clustering on NT through Wolfpack." In June CA will introduce OpenIngres 2.0, its relational database engine, which will provide clustering on both Unix and NT. On the NT side, Huff says clustering will be available through Wolfpack.
In November 1996 Microsoft published the Wolfpack application programming interface at its Server Professional Developers Conference. With the new API, Microsoft says developers will be able to build cluster-aware applications that support high availability, easier manageability and greater scalability.
In theory, developers should not need to worry about the Wolfpack API, or indeed, develop their client server applications with cluster support explicitly. This is because these applications would be built on top of commercial software from the likes of Microsoft and Oracle, which would offer Wolfpack's clustering facilities as standard. For instance, client-server applications built using Microsoft SQL Server 7.0 will provide support for clustering without the developer having to do any extra work.
In a clustering question and answers page on Microsoft's web site, the company says that not all applications need to be enabled for clustering in order to benefit. According to Microsoft there are two requirements which an application must satisfy in order to work correctly in a clustering environment. The first is that the server application should be "well behaved" in that it maintains everything it needs in order to restart cleanly on a disk drive which can be accessed by another machine in the cluster. The second concerns the client-side application which should be able to handle pauses in service of up to a minute while the application switches over to an alternative server.
Apart from that, the client software should not need altering. In fact Microsoft states that Wolfpack does not require any special software on the client for transparent recovery of services via standard SMB and IP network protocols. So remote PCs using web browsers to connect to a clustered server, or PCs connecting to a shared network drive on Windows, will not be affected.
However, Microsoft says that there are certain situations when client software will need changing in order to work correctly in a Wolfpack clustered server environment. For instance, software that connects databases to a named server using Microsoft's widely adopted ODBC (Open Database Connectivity) middleware API will need updating. The ODBC API, in fact, supports a way in which client applications can query which server they are running.
But developers would need to alter their client application code in order to use it.
Windows NT is not the only operating system to build in clustering technology aimed at commodity PC hardware. Unix seems to be going down that route too. Traditionally, Unix has offered clustering for many years but the cost has tended to be prohibitive. SCO recently introduced Unix clustering for commodity PC server hardware called ReliantHA which covers the same ground as Wolfpack, but runs on Unix rather than NT.
Comparing his company's technology to Wolfpack, David Gurr, product manger for Intel platforms at SCO, says: "ReliantHA is already available. The Phase 1 implementation of Wolfpack is due out later this summer." He continues: "Microsoft is following, not leading. It is providing failover for the volume server market. But it is too late. Enterprise customers want more than this."
SCO's long-term goal, in terms of clustering, is technology called ServerNet which works in conjunction with a suite of server software from Tandem, called Eclipse, to provide server clusters of up to 256 nodes.
According to Gurr, the Eclipse technology makes all nodes appear as one.
It handles load balancing and process migration should any node fail.
"We are very excited by this technology," says Gurr.
Microsoft has not yet said how Wolfpack technology will be packaged.
Clearly, at some point in the future it will become an integral part of the operating system. But there is also scope for hardware companies to offer Wolfpack clustering off-the-shelf. Data General, which focuses on the high-end NT market last month introduced its Cluster-in-a-Box product family based on Veritas clustering technology for Windows NT. Philippe La Fornara, NT business unit director at Data General, says: "We will offer our Cluster-in-a-Box for Wolfpack when Wolfpack is available."
Cluster-in-a-Box currently comprises two rack-mounted quad or six-way PentiumPro servers, preconfigured with Veritas Firstwatch for NT. La Fornara says that Data General will offer a Wolfpack version of this. "Our Cluster-in-a-Box cuts out several days of configuration work."
SQL Server what to look for.
Wolfpack: In Microsoft SQL Server
Phase 1: Hot Standby Solution
A failover solution improves data availability by enabling two servers to share disks and other peripherals within a cluster. When a system in the cluster fails, the cluster software will migrate the data and workload from the failed system to another server within the cluster. As a result, the failure of a system in the cluster will not affect the other systems, and, in most cases, the client applications will be completely unaware of the failure. Phase 1 will be available with the first release of Wolfpack scheduled for the summer of 1997 and, according to Microsoft, will work with SQL Server 6.5.
Phase 2: Symmetric Virtual Server Solution
Microsoft says that SQL Server 7.0 will have the capability to run several SQL Server services on a Wolfpack cluster. In a two-node cluster, each node will be able to support half the database and half the load. On failure, the surviving node will host both servers. During normal operation, each node will serve half the clients and will be managing the database on half the disks. SQL Server 7.0 will also include wizards and graphical tools to automate cluster setup and management according to Microsoft.
This phase will be supported with the Phase 1 release of Wolfpack.
Phase 3: Massive Parallelism
Phase 3 will enable SQL Server 8.0 to use massive parallelism on large clusters. When the overall load exceeds the capabilities of a cluster, additional systems may be added to scale up or speed up the system. According to Microsoft, this parallelism will be almost automatic for client-server applications such as on-line transaction processing, file services, mail services and Internet services. In these applications the data can be spread among many nodes of the cluster, and the workload consists of many independent small jobs that can be executed in parallel. By adding more servers and disks, the storage and workload can be distributed among more servers. Similarly, for batch workloads such as data mining and decision support queries, parallel database technology can break a single huge query into many small independent queries that can be executed in parallel.
According to Microsoft SQL Server 7.0 will support pipeline parallelism, whereas SQL Server 8.0 will support partition parallelism.
Clustering: approaches to clustering
In an active/passive clustering solution, a backup server monitors the active server where the server and programs can operate. The backup server remains in a passive, stand-by mode until it determines that the active server has failed. Then it comes on-line and takes control of the system.
When the primary server is repaired and comes back on-line, automatic fallback is not possible. The operator must manually switch the systems back to their original state. Depending on the system, this may require downtime.
In an active/active cluster, all the servers in the cluster can run applications and act as recovery servers for other servers without having to terminate their own applications first.
They can also support failback, so when the failed server is repaired and brought back on-line, the clustering software automatically redistributes the applications that failed over. Wolfpack clusters have been implemented as an active/active cluster.
High availability: fault tolerance systems
UPS and RAID
An uninterruptible power supply (UPS) provides basic protection against system downtime. If the power fails, the UPS provides a few minutes of emergency power until the power is restored. Windows NT has a graphical user interface to configure and manage a UPS. In addition, Windows NT has built-in support for RAID levels 0, 1 and 5. The higher RAID levels protect the system against data unavailability or loss due to disk failure.
Transactions and On-Line Recovery
If the system or the data storage should fail for some reason, database transactions and a database that supports on-line recovery provides a protection against lost data.
Backup and Recovery
When properly performed, backups minimise, if not eliminate, the possibility of lost data. Backups may assist in reducing downtime since having an up-to-date backup allows quick recovery in the event of a catastrophic hardware failure. Nevertheless, users see the recovery period as downtime.
Replicating data to a backup server is a viable alternative if 100% availability is required. Replicating data to the backup server implies additional overhead to the on-line system and is not foolproof. Depending on how this is implemented, in most cases there will be a latency where the transaction has been committed on the primary server and has not been replicated to the backup server. This may result is a corrupted database that will be difficult, if not impossible, to reconcile.
AMD's Zen chip roll-out continues with the focus on high-power embedded applications
And becomes the team's executive chairman to boot
'Whatever the causes of political polarisation today, it is not social media or the internet,' claims Dr Grant Blank
Tesla founder leaves OpenAI group - while Valve Software's Gabe Newell joins