Two weeks ago at the Microprocessor Forum in San Jose, Intel andwill herald a new era of computing. Cliff Saran looks at three key areas of the new architecture. Hewlett-Packard jointly presented their vision of 64-bit computing with a seminar discussing the architecture of the IA-64 microprocessor.
IA-64 represents a radical departure from today's Intelcompatible microprocessor and is destined to power high performance PCs and servers for the next millennium. The first of this new 64-bit processor family will be the Merced chip, which is expected in 1999.
Speaking at the Microprocessor Forum, John Crawford, director of microprocessor architecture at Intel, said that the whole focus of the new architecture was to provide a scaleable foundation for parallel computing.
To achieve higher performance in previous generations of the all-conquering Intel x86 architecture, processors had to execute more machine code instructions per cycle. This was achieved through parallel execution, where the processor was able to perform several actions simultaneously. Processors used pipelining to allow them to load several instructions into the microprocessor in one go. Once there, instructions that could run independently were routed to specific functional units within the microprocessor so they could be executed concurrently. For instance, floating point and arithmetic instructions could normally run in parallel.
This type of parallel processing is called "implicit" because the processor decides whether groups of instructions can be run in parallel. However, today's processors are hampered in their ability to run instructions in parallel by two main factors. Firstly there is memory latency, which delays the speed at which the processor can be fed with data from memory; and secondly, branch instructions, which alter the flow of instruction execution in a running program. During a conditional branch instruction of the form: "if something occurs, do some action, otherwise do something else", parallel execution is fruitless since the program flow depends on the outcome.
To overcome these limitations, the IA-64 uses a completely new architecture called EPIC (Explicit Parallel Instruction Computing) based on "explicit" parallelism. Here, the program running on the microprocessor explicitly defines which of its instructions can and cannot be run in parallel. This definition process is done long before the program is run: it is achieved during software development, when the program is compiled to machine code instructions. The compiler effectively takes control of making a program run in parallel by ordering machine code instructions efficiently.
The IA-64 will introduce a new instruction set in order to support explicit parallelism. Instructions are combined into 3-instruction bundles. Each instruction in the bundle holds information on whether it is independent of its two neighbours, or whether either or both must be executed first.
This information extends outside the bundles to whole groups of instructions in a program.
During program development, it is the job of the compiler to juggle machine code instructions in order to reduce the amount of dependency between them. As part of the process of generating machine code instructions, the compiler stamps what Intel describes as template bits on to the instructions, which describe the dependency of each instruction on its neighbouring ones. Now, when the microprocessor reads these encoded instructions, it can determine which ones can be run in parallel.
The IA-64 architecture also attempts to overcome the bottleneck of branch instructions which hinder the processor's ability to run code in parallel.
In current processors, if the outcome of a condition is true, then one "branch" of code is executed. If the outcome is false, an alternative branch is executed. Since current processors cannot predict whether the condition will be true or false, they need to wait for the outcome before executing one or other of the two branches of code.
EPIC offers a partial remedy. Along with dependency information, most instructions on the IA-64 include what Intel describes as a "predicate" flag. This flag tells the processor whether instructions belong to the "true" or "false" branch in a conditional statement. Using this information, the IA-64 is able to load instructions from both the "true" and "false" parts of a conditional statement.
The processor can start executing the correct branch of instructions as soon as the result of the condition is known. Again, the compiler has a critical role to play in making such predication possible, by arranging the conditional branch code in an efficient way.
Since the hard work of figuring out which individual instructions in a program can be run in parallel is performed when the code is compiled, the design of the IA-64 can be simpler than its predecessors, according to Intel.
The final piece in the EPIC jigsaw is speculation, which is effectively a workaround for the problem of memory latency. Memory latency occurs when the processor needs to wait in order to load data from memory, due to the relatively slow speeds of memory compared to processor speed. Often, this data is essential in order to complete a conditional branch, such as: "if the value of the data is X then do some action".
Rather than wait for data to load from memory into the processor before it can be used, on the IA-64 the load is initiated ahead of time. This gives the data more time to travel from memory into the processor. While the data is being loaded, the IA-64 can use the predicate field in the conditional branch instructions to load both the "true" and the "false" branches of code execution.
Programming the IA-64
Intel says that the IA-64 will provide new levels of parallelism. It claims that the new architecture breaks the sequential execution paradigm of previous generations of processors.
Clearly, to take advantage of this new architecture, there will need to be a whole new generation of 64-bit software, starting with an operating system. Windows NT is the logical choice, but it has a long way to go before it becomes fully 64-bit. Even the so-called 64-bit Alpha processor version due in Windows NT 5.0 is fundamentally a 32-bit operating system under the covers, with special hooks enabling it to access a 64-bit memory address space.
The arrival of NT 6 is perhaps the earliest opportunity users will get to see a 64-bit version of NT, but there is no release date planned for this as yet. However, if Intel expects Merced, the first IA-64 processor, to ship in 1999, then this must be the earliest a 64-bit NT will be available.
But delivering a version of NT - or any software for that matter - which is optimised for the IA-64 could be hard work. "On the (IA-64) a lot more onus is put on compilers rather than parallel technology in the processor," said Joe D'elia, senior analyst at DataQuest. What this means is that software development for the IA-64 will have to be a whole lot smarter than it is today, in order to extract the most performance from the processor.
And while software developers today do rely to some extent on clever compilers to generate optimised code, most agree that compilers do not go all the way. Often, the best programs, in terms of performance, are the ones most elegantly hand-crafted. Neil Ward-Dutton, senior consultant at Ovum, believes that the IA-64 could put pressure on software developers to deliver faster applications. He said that the situation could be similar to the way in which database developers optimise queries by hand today: it serves to suggest that, whilst tools for optimisation are available, the best performance still comes from manual optimisation.
64-bit Programming: glossary of terms
Branches - The "forks in the road" of a program, at which a decision is made regarding the correct path in order to continue.
Compiler - A tool that translates a programmer's high-level instructions into the language of the microprocessor.
EPIC (Explicitly Parallel Instruction Computing) - The new "architecture technology" that was jointly defined by Intel and HP (analogous to RISC and CISC). It will be the foundation for the new 64-bit Instruction Set Architecture.
Explicit parallelism - The ability of the compiler to directly inform the processor of the independent nature of operations.
IA-32 (Intel 32-bit Architecture) - Intel's volume processor product family, addressing computing requirements for desktop, mobile, servers and workstations.
IA-64 (Intel 64-bit Architecture) - The Intel 64-bit Architecture implements EPIC concepts, using the jointly developed 64-bit Instruction Set Architecture in addition to full IA-32 compatibility.
Implicit parallelism - Found in conventional microprocessor architectures, this requires the compiler to create sequential machine code that can interact with the processor.
Memory latency - The time it takes for the data to arrive from the memory to the processor, once the processor has requested it.
Merced - The first processor from Intel in the IA-64 family.
Mispredicts - A wrong decision regarding which path to take.
Parallelism - The ability to execute multiple instructions at the same time (in contrast to sequentially, where one instruction is executed after the other).
Predication - A technical concept that contributes to increasing overall performance by the removal of branches and associated mispredicts.
Computer speculation - A method of initiating a request for information, even before it is definitely known that the information will be needed.
Cotton seedling freezes to death as Chang'e-4 shuts down for the Moon's 14-day lunar night
Fortnite easily out-earns PUBG, Assassin's Creed Odyssey and Red Dead Redemption 2 in 2018
Meteor showers as a service will be visible for about 100 kilometres in all directions
Saturn's rings only formed in the past 100 million years, suggests analysis of Cassini space probe data
New findings contradict conventional belief that Saturn's rings were formed along with the planet about 4.5 billion years ago