While currently shipping with high clock speeds - 1.4GHz and 1.5GHz - it's the P4's new instruction set, rather than its brute megahertz ratings, that will ultimately give the processor its performance boost.
The principal technological innovation underpinning P4 centres on a micro-architecture that Intel calls NetBurst. This is the implementation of the chip architecture in silicon, and tends to change from one processor generation to the next.
The processor architecture, by comparison, refers to the instruction set, registers, and memory-resident data structures that are public to programmers. While these are maintained and improved from one processor to another, a family of processors will share the same basic architecture.
For the Pentium 4 this common grain is IA-32, which has been around since the arrival of the 8086 processor. The common architecture gives a family of processors backwards compatibility, so that any software written to take advantage of the IA-32 architecture will run on any compliant CPU.
Making a processor faster - the technical bit
Before explaining how the NetBurst micro architecture works, it's important to understand what makes a processor faster. While the clock speed (MHz or GHz) is often quoted as the be all and end all determining processor speeds, this does not actually provide an accurate indication of a CPU's rating.
Similarly, the number of instructions that can be operated in a clock cycle (IPC) also fails to indicate a processor's actual speed. The real measure of a processor is a combination of the two: performance = MHz x IPC.
Improving either the clock frequency or the IPC - or ideally both - will therefore get better performance out of a processor. On top of this, other improvements can be made so that fewer instructions are needed to perform the same operation. For example, in 1996 the MMX versions of the Pentium processors implemented 64-bit integer single instruction multiple data (SIMD) instructions. After this, 128-bit SIMD single precision floating point (SSE) instructions were introduced on the Pentium III processor.
These developments increase processor speeds as a single instruction can operate on multiple data sources.
The improvements to the Pentium 4 make more sensible use of processor time, and not just ramp up the CPU's speed. However, clock speed is still important, so the P4 micro-architecture is designed to allow processor frequencies to be raised more than 40 per cent compared with existing P3 architectures.
Moving on, Intel has implemented the hyper-pipelined technology. A pipeline in a CPU stores a list of instructions that are to be executed, so that it can start work on them before they are actually processed. In each pipeline there are sets of instructions, each at a different stage of execution. With the Pentium 4, this pipeline has been doubled in depth.
However, just making the pipelines longer does not provide a quick fix as there are several overheads associated with extended pipelines. Of these, the most significant is that of predicting branches. Inside an application, points are often reached where a choice can be made to execute different bits of code. Such points are known as branches.
Unfortunately, pipelines operate by knowing what's coming up so that work can start. The result of a branch is not known until after a specific instruction has been executed. To get around this, processors try and predict the outcome of a branch and start work on instructions based on the predictive outcome.
If the wrong prediction is made, the pipeline has to be cleared and work started afresh on the correct branch of the code. This recovery time can be very wasteful of processor resources. Intel has attempted to get around this through the use of an Advanced Dynamic Execution (ADE) engine, and an Execution Trace Cache (ETC).
The ADE engine creates a large out-of-order instruction window that allows the processor to avoid stalls that can occur while instructions are waiting for dependencies to resolve. The most common reason for such stalls to occur comes from waiting for data to be loaded from memory on a cache miss. This occurs more often in high clock frequency designs, as the latency to main memory increases relative to the core frequency of the processor. The NetBurst architecture has up to 126 instructions in this window, a large improvement over Intel's previous P6 micro architecture which had only 42 instructions.
Working with this is an enhanced branch prediction capability that allows the Pentium 4 processor to be more accurate in predicting program branches. Intel claims that this reduces the number of branch mis-predictions by 33 per cent, compared with its predecessors. This works through a 4Kb branch target buffer that stores more detail on the history of past branches in addition to a more advanced prediction engine.
The ETC is a Level 1 cache that caches decoded x86 instructions (micro-ops). This has the advantage that the latency associated with the instruction decoder is removed from the main execution loops.
The cache stores these micro-ops in the path of program execution flow, where the results of branches in the code are integrated into the same cache line. The overall effect is that the ETC doesn't store instructions that are skipped over due to a branch.
The system bus stops here
The final important part of the processor is a 400MHz-system bus, up from the 133MHz bus used by the Pentium III processor. As processor speeds increase, it's often interaction with other system components that slow the computer down. The 400MHz bus gives the Pentium 4 processor 3.2Gb of data per second in and out of the processor. With the 133MHz bus, there is only 1.06Mbps available.
To get the most out of the Pentium 4 a new motherboard chipset is needed. Fulfilling the job is the Intel 850. Included in the spec is a new Memory Controller Hub (MCH) that supports dual RDRAM (Rambus) memory channels and the 400MHz system bus that the Pentium 4 requires.
Unfortunately, Rambus is currently very expensive, and this chipset locks corporations into buying it. A better solution seems to be opting for other technology such as double data rate (DDR) memory.
This kind of memory is cheaper to manufacture and offers better performance, as data can be read on both sides of a clock cycle - rising and falling edge. It will be some way into next year before Intel will be able to support other memory technologies, while AMD currently has support for DDR.
Other than this, the only major change comes from the I/O controller hub, which again makes use of the 400MHz bus. This hub makes a direct connection to the graphics and memory, which offers a substantial improvement in I/O access times.
Overall, the Pentium 4 shows some significant technical improvements over its predecessors, but currently at a hefty cost premium.
Antarctica lost on average 252 gigatons of ice mass per year from 2009 to 2017, claims study
Buyers can demand refunds if they've had a game for no more than 14 days and not registered more than two hours of play
Total lunar eclipse 2019: 'Super Blood Wolf Moon' to be visible across Europe and North America on Sunday night
Moon will turn reddish-orange in colour during this weekend's total lunar eclipse
Hackers to compete for prize money of between $35,000 and $250,000 cracking the Tesla Model 3 at this year's Pwn2Own contest