Two Billion-Transistor Beasts: POWER7 and Niagara 3

Submitted by SReilly 2010-02-10 Hardware 18 Comments

“In years past, an ISSCC presentation on a new processor would consist of detailed discussion of the chip’s microarchitecture (pipeline, instruction fetch and decode, execution units, etc.), along with at least one shot of a floorplan that marked out the location of major functional blocks (the decoder, the floating-point unit, the load-store unit, etc.). This year’s ISSCC is well into the many-core era, though, and with single-chip core counts ranging from six to 16, the only elements you’re likely to see in a floorplan like the two below are cores, interfaces, and switches. Most of the discussion focuses on power-related arcana, but most folks are interested in the chips themselves. In this short article, I’ll walk you through the floorplan of two chips with similar transistor counts – the Sun’s Niagara 3 and IBM’s POWER7.”

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

18 Comments

2010-02-10 4:26 pm

SReilly
I know I’m going to be messing with POWER7 systems in the near future but I’d love to get my hands on a Niagara3 system and give it a spin. Unfortunately, I can’t find a release date for the chip and as the Sun acquisition has thrown so much to the wind at the moment, I can see it taking a while.

It would be lovely to test the two side by side. Who knows, sometimes geek dreams do come true! ;-D
2010-02-10 5:28 pm

tylerdurden
Interesting. The Niagara3 has a ton of threads, so I assume they can hide the latency pretty good. But 6MB of L2 for so many threads sounds awfully small. Although I am sure this chip has database appliances written all over it, where latency is the main limiter anyways so maybe it does not make much of a difference.

The Power7 seems to be aimed more towards the number crunching end of the spectrum.

I’d be interested in knowing if the Rock project was really put out to pasture, or if they may give it another go. I thought the implementation of transactional memory in HW would be of interest to Oracle. I wonder if maybe Oracle will position SUN to develop throughput-oriented SPARC processors, and they will simply rely on Fujitsu for the performance-oriented SPARC parts.

2010-02-10 8:19 pm

Dubhthach

The Power7 seems to be aimed more towards the number crunching end of the spectrum.

The Power7 was developed as part of the “High Productivity Computing Systems” from DARPA. IBM got the contract which was worth US$244million. No doubt there be lots of orders for Power7 supercomputers for the likes of DoE etc.
2010-02-11 12:02 pm

MrVain
Niagara T3 has a small cache, because it is a server cpu. You can never fit in a server’s workload into a cpu cache. A server serves thousands of clients. All that client data can not fit into a cache, nor POWER7 cache would do. And kernel data, OS data, other system data: Database, etc – all that can never be cached.

As a server, you have to go out to RAM all the time. Big cache is of no help to servers. To desktops, big cache is good. You run single programs, they fit into a cache. POWER7 is more similar to desktop CPU, that runs few programs.

2010-02-11 5:04 pm

rydan
Nobody’s trying to fit an entire workload into cache, that is simply not the purpose of the cache. Yes, a server can serve thousands of clients but a server can also act as a database for a limited number of application servers serving a limited set of users. Correlating a small cache with a server CPU and a large cache with a desktop CPU is just plain ignorant.

Kebabbert, I know you are a big fan of Sun but please stay objective and help fertilize an interesting discussion of different CPU architectures. The T3 will excel in some areas, the Power7 in others.

2010-02-12 1:10 pm

MrVain
Regarding cache, I should explained more. Here goes:

According to Intel studies, an 2.4GHz x86 server waits for data 50% of the time – under FULL load. This means it idles 50% of the time under max load. This is because of cache misses, the CPU needs to fetch data from RAM, and then you halve the performance – under full load.

Thus, it is extremely important to fit the data into the cache. That is why POWER, x86, Itanium, etc all of them have big cache, complex prefetch logic, etc – to keep all active data in the cache. But you will always get cache misses, that is not avoidable (unless the CPU has ESP).

CPU is very fast, RAM is slow, so the CPU must wait long time for slow RAM. If CPU is even faster, 4GHz, it must wait even more. High clocked CPUs are very punished by cache misses, maybe they idle 60% under full load?

Desktop workloads are perfect for these cpus with a large cache: desktop workloads are small, few threads, fit into a cache.

Server workloads are the worst. There are many thousands of clients, and the cpu tries to cache all that data. And it tries to cache Kernel, and OS, and other software running on the system (Database, etc). All these data will never fit into a cache. The cache will start to thrash, because it starts to swap in/out new data all the time. There will always be cache misses, and performance will be severly punished.

Therefore, you MUST try to fit all data into the cpu cache, to not loose performance. That is impossible. These cpus are more suited for desktop, because they get severe punished from cache misses.

Sun Niagara CPU, has a radical construction. It does not suffer from cache misses, it almost never idles, because it just switches thread and starts to work on something else when there is a cache miss. (To switch thread on x86 takes 100s of cycles). This is the first CPU that shows this behaviour. This is the reason it with 1.4GHz can be faster than 5GHz CPUs on multi threaded workloads.

Do you understand now? A cpu which depends on big cache and high hz, is a desktop CPU. A server cpu needs to be immune to cache misses, and have many threads – this is Niagara.

But I agree that Niagara T3 will be many times faster than POWER7 on server multi threaded workloads. POWER7 will be mucher faster on number crunching stuff, where you can fit all data into it’s cache. But, then, you can run Solaris on cheap Nehalem-EX, if you need number crunching. Solaris will win every workload – I believe.

2010-02-11 5:46 pm

tylerdurden
… I don’t think you understand what a cache is for. It is basically a latency reduction mechanism.

However, having so many threads alive in silicon at the same time must put a very high degree of pressure on such a small cache. I understand that having so many threads does help in hiding the latency somewhat, but the amount of cycles that a cache miss triggers, is far larger than the amount of threads you can use in the core at any given time.

I assume that SUN have done their homework, and they can get their systems to provide a high degree of throughput for the workloads they are tar getting (I assume mostly highly threaded webservers and data bases). Still the cache and the memory channels seem to be rather limited for such a system bound to have some incredible memory bandwidth requirements to keep all those threads fed.

2010-02-12 3:39 am

cb88
My take on it is cache is hot literally it is expensive from the thermal profile perspective.

It is also the reason the CELL undershoots its predicted performance by about 25% iirc my computer organization processor made a point that we understood the importance of thermal properties

basically the cache on the CELL is too close to one of more of the SPUs and overheated it forcing it to downclock to sub 1 Ghz

Sun could be trying to avoid similar problems and at the same time use the gained space for something else productive.

Personally I would *REALLY* like to see a SPARC implementation along the lines of AMD fusion… perhaps something global foundries could produce a high proformace SPARC with an integrated GPU possibly with a shared cache? In fact I think that is exactly what SPARC needs if it has any hope of competition in the super computer market.

Edited 2010-02-12 03:43 UTC

2010-02-11 11:43 am

MrVain
It is rumoured that Intel Nehalem-EX will be fast as/faster than POWER7. Just google “nehalem-ex vs power7” and read articles.

You could probably buy three or four 8-socket Nehalem-EX servers for the price of a small POWER7 server. Then you get much more performance and redundancy.

2010-02-11 4:35 pm

rydan
Please stick to the topic, the article is a comparison between Power7 and Niagara T3. It has nothing to do with Nehalem.

2010-02-11 7:44 pm

Bill Shooter of Bul Platinum Prime
No, I’d say his comment was on topic. I’d prefer these comments to be open to not just discuss the article, but other relevant info about the topics of the article. Its more intellectually stimulating that way.

Now as to the matter of his topic, I don’t think it was fud at all. The matter of big iron vs commodity is one of serious debate that needs to be revisited as new products are released. That said, I don’t think the commenter did a very thorough job in exploring the new topic he introduced, but that’s okay in my book. Now someone with more experience can chime in.

I don’t think I can stress enough, how much I don’t want comments restricted to only the products mentioned in the article and linked stories. That’s too constricting and not very interesting to read.

2010-02-12 12:28 am

tylerdurden
Red herrings are intellectually dishonest, not stimulating.

If the previous poster had described the exact characteristics that makes the new nehalem much better than (or competitive with) these processors, then it would have been an excellent (and welcome) source of information and/or debate. However this was a thread with just a few posts on topic, thus a person trying to make an argument based on “I heard that….” it is indeed nothing more than FUD, and silly one at that. Furthermore, trying to spin that as “intellectually stimulating” is a bit of a stretch at best.

Edited 2010-02-12 00:29 UTC

2010-02-12 12:50 pm

MrVain
Jesus, I am not FUDing and making things up nor lie. I wrote:

“Just google “nehalem-ex vs power7″ and read articles.”

didnt I? The point was to show that there exists articles about Nehalem-EX vs POWER7 performance. I post some of articles here, to prove that I am not FUDing about performance of both CPUs.

http://news.cnet.com/8301-13512_3-10321740-23.html

“I expect the raw number-crunching performance of the Nehalem-EX cores to be roughly on the same level as Power7’s cores.”

http://www.eetimes.com/news/semi/showArticle.jhtml?articleID=219400…

“I am sure Power7 will be the fastest processor around, probably faster than Intel’s Nehalem in some benchmarks,” said Nathan Brookwood, principal of market watcher Insight64 (Saratoga, Calif.).

The last one is funny, POWER7 will be faster than Nehalem-EX in SOME benches, but the Nehalem-EX will be faster in all other benches?

What I am trying to say, is that yes the POWER7 is fastest – AS OF NOW. But wait for benches of the next-gen CPUs: Nehalem-EX, AMD 12-core bulldozer, SPARC64 Venus, Sun Niagara T3, etc. Then we will see if POWER7 is still the fastest in it’s generation.

And I am also trying to say that I believe Nehalem-EX server will be very much cheaper than one POWER7 server. I believe I can get three or four x86 servers for the price of one POWER7. If the Nehalem-EX reaches 90% of the POWER7 performance, then the choice is obvious to me.

You need six IBM p570 servers to match one Sun T5440 on Siebel v8 benches. One p570 is $413.000. One T5440 is $76.000. IBM servers are not cheap. I DO believe you get several x86 servers for the same price.

Buy two Nehalem-EX, one x86 server mobo and you probably have something comparable to the cheapest IBM POWER7 server, in terms of performance.
2010-02-13 8:43 pm

foobar
Jesus, I am not FUDing and making things up nor lie. I wrote:

“Just google “nehalem-ex vs power7″ and read articles.”

didnt I? The point was to show that there exists articles about Nehalem-EX vs POWER7 performance. I post some of articles here, to prove that I am not FUDing about performance of both CPUs.

http://news.cnet.com/8301-13512_3-10321740-23.html

“I expect the raw number-crunching performance of the Nehalem-EX cores to be roughly on the same level as Power7’s cores.”

http://www.eetimes.com/news/semi/showArticle.jhtml?articleID=219400…

“I am sure Power7 will be the fastest processor around, probably faster than Intel’s Nehalem in some benchmarks,” said Nathan Brookwood, principal of market watcher Insight64 (Saratoga, Calif.).

The last one is funny, POWER7 will be faster than Nehalem-EX in SOME benches, but the Nehalem-EX will be faster in all other benches?

Until “all” is defined, you are just handwaving.

What I am trying to say, is that yes the POWER7 is fastest – AS OF NOW. But wait for benches of the next-gen CPUs: Nehalem-EX, AMD 12-core bulldozer, SPARC64 Venus, Sun Niagara T3, etc. Then we will see if POWER7 is still the fastest in it’s generation.

Only Nehalem-EX, and AMD 12-core bulldozer have any merit in your list.

Neither Fujitsu nor Oracle have Venus on their road maps for commercial use. It’s only on Fujitsu’s road maps for a single HPC project.

Niagara T3 is a year away.

By your definition, we can only compare power 7 to tukwila because they were announced on the same day. Get over it, schedules never line up.

And I am also trying to say that I believe Nehalem-EX server will be very much cheaper than one POWER7 server. I believe I can get three or four x86 servers for the price of one POWER7. If the Nehalem-EX reaches 90% of the POWER7 performance, then the choice is obvious to me.

You need six IBM p570 servers to match one Sun T5440 on Siebel v8 benches. One p570 is $413.000. One T5440 is $76.000. IBM servers are not cheap. I DO believe you get several x86 servers for the same price.

You need to update your numbers. Jesper did this for you over at the register. How quickly you forget:

And price.. why don’t you mention that a T5440 costs 116KUSD 128GB RAM 4 [email protected], and a POWER 750 costs 102KUSD with 128 GB RAM 4 [email protected].

He did a good job with the rest of your arguments too:

http://forums.theregister.co.uk/forum/1/2010/02/08/ibm_power7_chip_…

No one is claiming that IBM is the best, but some of your arguments are just too exaggerated.
2010-02-15 11:26 am

MrVain
Until “all” is defined, you are just handwaving.

I got accused of FUDing when I wrote that Nehalem-EX will be as fast as POWER7.

Am I FUDing or did I post links to articles where experts clearly state that Nehalem-EX will be as fast as, or faster than POWER7? Did I lie? No. So I dont see how your post about “handwaving” was relevant to my claim that I am not FUDing about Nehalem-EX vs POWER7.

Ergo, I do not FUD. There are experts in the industry believing what I wrote. My claim has credibility.

Only Nehalem-EX, and AMD 12-core bulldozer have any merit in your list.

Neither Fujitsu nor Oracle have Venus on their road maps for commercial use. It’s only on Fujitsu’s road maps for a single HPC project.

Niagara T3 is a year away.

By your definition, we can only compare power 7 to tukwila because they were announced on the same day. Get over it, schedules never line up.

No, we can not compare POWER7 to Tukwila because they were announced the same day, I write that we must compare POWER7 to the other chips “in it’s generation”.

Everyone agrees that Playstation3 and Xbox360 are in the same generation and should be compared. Even though PS3 arrived one year later. You think that PS3 and Xbox360 can not be compared because they where not announced the same day? They are not the same generation? Must products be announced the same day, and the same second? I doubt Tukwila and POWER7 where announced the same second, why are you willing to compare them, then?

You need to update your numbers. Jesper did this for you over at the register. How quickly you forget:

And price.. why don’t you mention that a T5440 costs 116KUSD 128GB RAM 4 [email protected], and a POWER 750 costs 102KUSD with 128 GB RAM 4 [email protected].

He did a good job with the rest of your arguments too:

http://forums.theregister.co.uk/forum/1/2010/02/08/ibm_power7_chip_…

I tried to show by example that IBM has very high prices to make my claim probable: that POWER7 will be much more expensive than Intel Nehalem-EX gear.

I dont really understand what Sun vs IBM price, has to do anything with Nehalem-EX most probably being cheaper than POWER7 gear? Have you disproved my claim that Nehalem-EX will be much cheaper by linking to Sun gear, now?

Man, you confuse me. Your claims are a bit weird. I suggest you reread my post again, but slowly. First you talk about “handwaving” when I prove that I do not FUD – “handwaving” did not disprove my point, it was not relevant. Then you talk about “same day” when I clearly talk about next-gen CPUs. Now you talk about Sun and IBM pricing when I try to argue that IBM will be much more expensive than Nehalem-EX.

Regarding that Jesper, he is funny. I dont know how much I tried to explain to him, but no, he just refuse to comprehend. For instance, I wrote “you need four 5GHz POWER6 to match two Nehalem 2.93GHz in official TPC-C benchmarks” – and he STILL insists that POWER6 is the faster CPU (he talks about pricing on single cores, etc). I never got it his weird IBM marketing talk. If you need four POWER6 to match two Nehalem, how can POWER6 be the faster CPU? He explained and explained, but I never understood his weird explanations. From a logical viewpoint they where wrong.

It just like when IBM claims that IBM STILL has the TPC-C world record right now. Because IBM used fewer cores, and Sun used more cores. But if you look at the TPC-C list, who is at the top? Who has the record? It is Sun. Not IBM. So I dont get it when IBM explains that they still has the world record (because they used fewer cores).

Jesper shows the same weird reasoning as IBM does. I dont understand anything of what he says. Neither do you understand what I write. Your posts have nothing to do with what I write.

No one is claiming that IBM is the best, but some of your arguments are just too exaggerated.

So which of my arguments are just too exaggerated? I agree that POWER7 is fastest right now. I have no problem with admitting that. Jesper guy has problems admitting that Nehalem is faster than POWER6, even though you need four POWER6. He showed other problems too, when we discussed about other things.
2010-02-12 3:58 pm

Bill Shooter of Bul Platinum Prime
You can hardly expect any poster to either be an expert on the subject they are commenting on, or comprehensive in their comments. It did server the purpose of introducing a new, but related topic that is a very hot issue right now. You may not agree with him, but its an interesting topic related to the current one. Its not off topic, and I’d rather not have a draconian policy to label every thing that’s not strictly mentioned in the article declared off topic and taboo in comments.

Commodity vs big iron is a big interesting subject. worthy of discussion in these august pages.

2010-02-11 5:48 pm

tylerdurden
… so that is how FUD gets started. LOL

2010-02-15 9:02 pm

spanglywires
Please quit spamming *every* site I read

Yes, Niagara is pretty fantastic, and yes, IBM Power is not all it’s exaggerated to be, but I am so thoroughly bored of your constant provocation of everyone everywhere aout everything!

We *know* Siebel and RAC can run faster on Niagara, but its still not quite the all-round performer it needs to be. So long as Oracle|Sun hopefully have the balance of cost/quality/performance about right we’ll have these lovely Niagara toys around for years to come, and that speaks far clearer than any ramblings about $’s vs benchmarks.

Now, as a Sun-shiner as some like to call us, I’ve got to say Power7 is quite impressive. It looks to correct the majority of the flaws in Power6 which certainly seems a dud to me. It (Power6) is a number cruncher, but when you carve it up into lpars you pay a huge price for its in-order execution, especially when you max the RAM which means the memory clock drops.

The design of Power7 where it steals the turbo core feature from Nehalem or branch it into a small pool of threads is a good flexible trade off that should suit the workloads of the next few years.

Sun|Oracle needs a big box like this, whether it will be the Venus APL2 stuff, or whether its a monster Niagara system built up into a huge rack cluster like the E15k/E25k’s were remains to be seen, but suffice to say this monster will be top of the list in Larry’s lab now!

Take a look at it all objectively. For IBM its the first chip since Power3 in the SP’s and RS64’s thats got some true merit and dare I say it – thoughtful design behind it. For Sun, theres a lot of damage to undo, and hopefully theres enough of the smart people left to apply that innovation Sun is famous for.