The Pentium 4-based Prescott processor, due in the second half of 2003, will be manufactured using 90-nanometre production technology and carry 13 new instructions aimed at specific applications including media and games. Debuting at 3.4GHz, Prescott will also carry improvements in hyperthreading and an 800MHz bus. Prescott will also feature larger caches and be scalable to 4 to 5 GHz. Intel vice president for mlogic technology development, Joe Schutz, said the company plans to be hitting the 15 to 20 GHz mark by 2010.
I found this surprising:
However, Mr Burns (Intel vice president for desktop products) said he could see no requirement for a 64-bit applications on the desktop and therefore, 64-bit desktop processors.
Apparently it’s Intel’s intention to stick with IA32 on the desktop for now. This goes against all speculation surrounding Yamhill (which Intel still denies exists), that Prescott would contain support for x86-64 as well. From this article Prescott definitely sounds like nothing more than a souped up Pentium 4.
It will be interesting to see how AMD’s success or failure with x86-64 may sway this decision. I think it’s quite likely that we won’t be seeing many x86-64 applications for Windows for quite some time.
On the server side of things, Opteron will most likely be successful due to the pervasiveness of Linux and other open source software which can take immediate advantage of the new ISA. Whether these successes will help bolster support for the desktop side of things remains to be seen.
/me chugs one for the cpu conspiracy theorists.
However, Mr Burns (Intel vice president for desktop products) said he could see no requirement for a 64-bit applications on the desktop and therefore, 64-bit desktop processors.
At that point, Mr. Burns then put his fingertips together and said “ehhh-xellent”.
Most 32 bit desktops don’t come with the ability to support 4GB of RAM.
Apple PowerMac — 32 bits for how long? Apple’s only recently gotten around to offering hardware support 2GB of RAM, 50% of a 32 bit processor’s direct physical address space.
PC — 32 bits for … wow … a long time. And on most machines today, there is still a 2GB RAM limitation. On many mass market machines, you can’t go over 1GB without risking strange hardware problems. At least Xeon workstations and servers have not had a 2GB hardware limit for a while.
Linux didn’t support >2GB physical RAM until relatively recently and then the first few versions of large memory support were slow.
Windows has had >2GB physical RAM support for a long time. It’s not speedy, but it’s been debugged over time.
32 bits could run a couple years more, no problem.
A 64 bit address space is not going to be of much use in desktops for a while. The exception will be virtual memory management for large data files (HD video for example).
Based on Intel’s ability to scale up clock frequency, we could see 32 bit performance continue to make big jumps for several years.
Personally, I’m much more interested in dual processors than a 64 bit processor.
–ms
A 64 bit address space is not going to be of much use in desktops for a while. The exception will be virtual memory management for large data files (HD video for example).
Based on Intel’s ability to scale up clock frequency, we could see 32 bit performance continue to make big jumps for several years.
The real advantages of x86-64 for desktops come in the form of changes to the ISA, specifically the inclusion of additional registers, which is one of the largest bottlenecks plaguing IA32. Regardless of the addition of virtual register mapping in superscalar x86 (at least how Pentium Pro/Athlon implement it) all IA32 code is still limited to an extremely small set of registers (4 which can be used effectively) which places great restraints on how well IA32 compilers can effectively optimize code, and make for an ineffective memory usage for IA32 code.
Since this is OSAlert, this is what I’d like to see in a processor, from the OS’s point of view.
1) The end to registers. Treat the register set as just a L0 cache of physical memory. With the Itanium’s 256 registers, this translates to a 2KB-3KB L0 cache. This would take quite a few transistors, because the register set would have to be fully associative memory. But transistors is one thing we have no lack of these days This would have the advantage of speeding up context switching a whole lot because no register saves would be required.
2) The end to pages. Pages are such an inelegant and coarse sharing mechanism. I’d like to see an object-level protection mechanism. In this scheme, all processes would get a specific piece of the 64-bit address space. To access memory in another address space, they would have to use special pointers that had protection-information encoded in. These pointers would be protected by a special tag bit in each memory word identifying the word as pointer which could only be written by the OS. The IBM AS/400 has exactly this type of setup. Other CPUs, like the IA-64, also have advances to the traditional paging mechanism, but the Athlon-64 has none beyond the extremely simplistic x86 mechanism.
3) Something like PALcode. PALcode in the Alpha provides high level operations stored in microcode. It makes processor management easier.
4) ACPI or OpenFirmware. In x86 machines, there are a zillion different ways to detect and initialize hardware. I’d like to see machines that configure everything via the firmware, so the OS only has to look in one place. Nobody uses ISA devices anymore, so ditching them shouldn’t make a difference.
You raise an important point about a weakness of the IA32 instruction set architecture. Register space certainly is quite limited and this cuts back performance.
However, there is nothing that says post-Prescott Intel is not going to fix this problem. Intel put a cornucopia of registers in Itanium, so it is something they know how to do.
And in the big picture of everything that goes into processor performance, Intel has many of the other parts of the system working well — double pumped ALU, massive bandwidth FSB, hyperthreading, SIMD, large on-chip caches, etc.
I believe Opteron is going to have excellent system performance, mostly on the account of each processor having its own dedicated dual channel DDR path to memory. Opteron is showing good SPEC2000 numbers, but nothing that is going to make Intel quake in their boots. It is system performance that AMD will take the low-mid range performance crown from Intel’s high-volume high-margin Xeon. There is a rumor that in 64-bit mode, the Opteron can pipeline SSE2 with amazing speed, but I have yet to see that in reality.
Unfortunately for AMD, it takes a good quality compiler to take advantage of having extra registers. So far there has not been any AMD equivalent of Intel’s C++ and Fortran compilers or Intel’s VTune utility.
I can’t wait to see some real Opteron benchmarks. I hope AMD does show some heretofore unobtainable performance levels. Opteron is going to make a killer Linux chip provided AMD has solid support chips (another traditional AMD weakness).
–ms
I understand that we need to innovate (hence the 800MHz fsb and the 13 new multimedia instructions), but one thing that gets on my nerves with Intel is that they do not provide a decent upgrade path for their existing customers. What I mean is that you cannot use the newer chips on your existing motherboard. I can accept a new motherboard every 2 years, but every 6 months is rediculous. With AMD, on the other hand, Socket A has been around for quite some time, and now also supports Bartons.
Another thing which annoys me is the frequency with which new instructions appear. I understand the need for these instructions, and support their introduction every 4 years or so. But every new Intel core has a set on new instructions – it seems that Intel and doing this to allow for new multimedia based benchmarks to beat AMD (ie. like all SSE2 benchmarks). It takes AMD a year or so to introduce the same instructions to their core and beat Intel again, but Intel pull the dirty and introduce 13 new instructions, again giving it an upperhand on a limited set of benchmarks for 6 months or so. AMD, on the other hand, can introduce all the new instructions it wants (ie. 3DNow2), but noone uses them.
Nevertheless, I’m all for progress, I just like it better when all major changes are done at once instead of incrementally. This new Intel lineup should have been called Pentium 5, but we all know how ridiculous that sounds.
at 20 GHz light only could travel about .01499 meters per clock cycle. And with an 800Mhz bus… about .3747 meters per second… and thats if there was no resistance. looks like we won’t be having anymore large motherboards in the next few years…
Actually, the 800 MHz bus is 200 MHz QDR. This means the bus runs at 200 MHz, but sends data 4 times per cycle. So we can have motherboards at least 1.4 meters in length (Yeah right. It’d turn into a giant antenna!)
As for Intel’s upgrade path, the fact that you’re needing new motherboard every six months is really a factor of the P4’s design. The Athlon, at this point, can’t really take advantage of a faster bus, so chipset upgrades aren’t really necessary for newer Athlon processors. The P4, however, can use all the bandwidth you can throw at it, so it’s worth it (performance wise) to upgrade the bus in sync with the processor. Interestingly, it’s the P4 that ushered workstation-class memory bandwidth to the PC. An 800 Mhz 64-bit bus has 6.4 GB/sec of usable bandwidth. That’s absolutely phenomenal, and places the P4 in Power4 territory (which has 12.8 per dual-CPU).
“at 20 GHz light only could travel about .01499 meters per clock cycle. And with an 800Mhz bus… about .3747 meters per second… and thats if there was no resistance. looks like we won’t be having anymore large motherboards in the next few years…”
20GHz only affects the die size, which is continually shrinking. I would think asynchronous processors would begin to take over before Intel hits 20GHz, however.
The 800MHz FSB really isn’t 800MHz, it’s 200MHz and sends 4 bits per clock cycle.
The 800MHz FSB really isn’t 800MHz, it’s 200MHz and sends 4 bits per clock cycle.
Well, Rayiner probably explained it better, and 13 seconds before I posted…
Mine should’ve read 4 bits per bus line per clock cycle. Obviously the FSB is moving more than 4 bits per cycle.
He has absolutely no technology background. kind of reminds me of Balmer. I swear, he would be the type of person who would be in tears at the site of a command line interface.
Please, could someone actually promote a manager into an IT company who has a clue about the products they’re selling? I mean, sure, you don’t have to be a guru, but atleast know the basic fundamentals of what you’re selling.
That has been the one thing I have been pleeding for. Get rid of the BIOS and inplement the OpenBoot Firmware that all the UNIX vendors use, and is also an open, documented standard. The sooner that happens, the better.
I was actually hoping that maybe AMD did that with thei x86-64, however, they seem to have stuck with the BIOS like flies to excriment.
See http://news.zdnet.co.uk/story/0,,t269-s2130826,00.html
Well that is good to see. Now hopefully it won’t be like the win16 –> win32 transition that seemed to be a never ending process with companies stuffing around and customers whinging and whining.
Hopefully Intel will start cracking the whip and the big OEM producers like Dell, HP/Compaq and IBM all start using it once it has been made available.
All the clock & bus speed improvements in the world won’t do much good if I/O speeds don’t improve also.
For operating systems that do a lot of disk swapping (you know who they are!), the speed bottleneck is the
ATA, EIDE or SCSI bus. Either get rid of disk-swapping, or speed up hard disk busses.
My Amiga 4000T is only clocking at 50Mhz with a 68060 CPU & appears to be just as fast (or faster) as our
Compaq Presario with 600Mhz Celeron CPU (because of all the disk-swapping that Windows does).
“Please, could someone actually promote a manager into an IT company who has a clue about the products
they’re selling? I mean, sure, you don’t have to be a guru, but atleast know the basic fundamentals of what
you’re selling.”
Bill Gates should be good enough for you.
But I agree it is a widespread problem. It is what killed Commodore,
for example – the top management were finance men who knew absolutely
nothing about computers.
Unfortunately people with technical knowledge are often honest, which
is a crippling drawback in corporate politics.
Bill Gates is alright, however, I still find him to much “bells and whistles” over content.
I mean, does one really need to have loud music and a light show when launching a product or annoucing a new strategy? I maybe the most boring man on the planet, however, I do know alot of people who can’t stand the over the top presentations that certain CEO’s do. McNeally and Larry Ellison are probably just has bad as Bill Gates.
Scott entering into the “stage” to some badly strung together guitar rift. Larry walks onto stage as if he was Clint Eastwood and Bill Gates was some low down challenger.
Give me the facts in straight speaking english. I don’t want it sugar coated or otherwise I either lose interest or I spend the next 4hours de-bullshyterising the speech into something coherant and logical.
For example. Look at the launch of SUN ONE, Scott couldn’t go for one minute without bashing .NET. When .NET was launched Bill Gates couldn’t do it in an adult, mature fashion, namely, giving and informative presentation on the technology, NOT the hype and NOT the so-called “promises” that will never happen.
i find intel’s experience with 64 bit computing to be evidence of just how strong the x86 32 bit/windows hold on the market really is.
even intel and MS are having difficulty getting off 32 bit. When the monopolists can’t stop their own train from running well then you know the status quo is strong.
Apple might actually be in a better position to introduce 64 bit computing to the public at large because of their tight control over everything, though i can’t see that happening any time soon either.
If you did away with registers in a 64-bit computer then your instructions have just grown in size. You’d need 64-bits for each operand. That would be an absolutelyhorrible decision to make. I would say having the register space of all the execution units be unified and then specify offsets into it. That way, if you allocate more space for integers if you don’t feel the need to use SSE operations as much. Passing data between execution units would then also not be a problem. That’s my two cents.
It’s not as bad as you’d think. On modern 64-bit RISC CPUs, all instructions are 32-bits. These CPUs depend on creative PC (program counter) relative address modes that minimize the number of bits needed in the instruction. Instructions would have to grow slightly, so the idea might be more suitable for a VLIW-type architecture that already has huge instructions. Or, just use variable-length instructions. Even the IBM 970 is doing an extra decode stage, and stuff like Intel’s trace-cache are eliminating instruction decode as a bottleneck, so it wouldn’t be that big of a deal to use an x86-like variable length instruction set.
As for 64 bit desktop, I could use one as soon as it comes out, both for address space and reg space & widths. This would be used for running chip design SW under Linux (Windows not much used anywhere).
Note that Intel’s most expensive P4s cost more or less the same as max amount of Ram that it supports, ie $400 cpu v 4Gig of ram. Thats wierd by traditional computer stds. I would like to be able to build fast DRAM HD replacement/cache but the 31/32 address space is a severe limit.
I agree also about removing register limits, but registers are just a form of crippled cache for tiny fixed window not even in memory space. Opcode ISA cripples the reg set size, & the rest is …
If you are familiar with the TMS9900 and the later Transputers, you will be well familiar with regs being in memory, but in those cases they were still limited to only 16 regs off a wp base pointer. This gave fast context switches but lousy reg performance since it was really a mem reference. Actually this was ok >20yrs ago since even DRAM memory was faster than cpus back then (4MHz or so), but not so today by orders of 100s slower.
The T9000 though solved the reg speed problem by keeping several wp contexts worth of reg sets in fully associative cache n way ported.
Quite a few of those T people ended up on the Alpha team & consequently Intel & others are at AMD so I expect the unexpected sooner or later.
As a few other OS readers are working on OSes, from time to time I work on a modern Transputer cpu that can fit in FPGA so its possible to do things quite differently from x86. I would also use preops to continuosly grow the reg size fields almost indefinitely a few bits at a time, ultimately into full mem addresses if they are outside the reg cache. The variable length 16b opcodes are grouped far ahead so can still be issued to exec unit every cycle. The bigger the reg cache, the bigger the reg names can be compiled for.
If anybody made it down here, IBM is now talking up their 5.6ns cycle DRAM technology (EETimes), primarily for embedded use in ASICs but also mighty good for L3 cache. Just imagine the improvements if L3 could go to 1Gbyte on a x86 mobo.
back to work…..