But as you might think, nobody at AMD envisioned it that way in the planning or design stages. No engineer would ever start working with the idea to “build a shit product”; a recent chat with an engineer who was at AMD during Bulldozer’s development gave us additional insight on what the original goals for the architecture were. AMD originally wanted Bulldozer to be like K10, but with a shared frontend and FPU. In one architecture, AMD would improve single threaded performance while massively increasing multithreaded performance, and move to a new 32 nm node at the same time. But those goals were too ambitious, and AMD struggled to keep clock frequency up on the 32 nm process. This resulted in cuts to the architecture, which started to stack up.
The last AMD processor I used pre-Zen was a Phenom II, which was a fine processor for the price. However, after that, it quickly became clear that Intel had taken the lead. As such, I never experienced this era of AMD, and I think many of you will have had the same experience. This makes articles like these incredibly interesting.
Actually Bulldozer was a decent product, just look at the Phoronix results from the time. You have to remember that Intel had no problem paying benchmarking companies to use their cripple compiler as well as throwing security in the dumpster for speed with their branch prediction, Try using a fully patched Core i5 from the same era with a Bulldozer and you’ll find adding in the security really throws the brakes on the Intel. Results from Phoronix below and you will see by the second generation they had it trading blows with the i7 3770k which considering it was priced cheaper than an i5 3570k? Is really not a bad result.
https://www.phoronix.com/review/amd_fx8350_visherabdver2/4
That’s piledriver you linked to, the follow up. Piledriver was ok. Bulldozer was much worse when compared to intel See https://www.phoronix.com/review/amd_fx4100_bulldozer/5
Uhhh that is the 4 core and frankly I never saw one of those in the wild, in fact the only ones I ever heard of selling those were ultra low end prebuilts.
The fx 8150 (as well as a few that went with the FX-6100) were what everyone went for as they were cheaper than Intel and better for streaming and as you can see it was trading blows with the i7 920 while again being priced closer to the i5 so not a bad deal..
https://www.phoronix.com/review/amd_fx8150_bulldozer/9
There is a reason most of the Twitch streamers ran AMD FX until Ryzen, it was the best for heavily multithreaded workloads without having to spend Workstation levels of cash. My grandkids are using my old Twitch boxes and paired with the RX 580 and 16GB of RAM? still great for 1080P gaming and light streaming.
Fair, I did see them but I’d agree with the link you provided, there were some benchmarks they won but overall a disappointment and not worth the cheaper price for many. I’d agree with the conclusion from that link
“However, as these results show, under Linux the AMD FX-8150 is a competitive product to the Intel Core i5 2500K when dealing with multi-threaded workloads. For single-threaded work and other select tasks, the Bulldozer performance is disappointing.”
Well that is why I said all us content creators and Twitch Streamers went AMD FX, to get the same multithreaded performance as a $120 AMD FX 8100 you were looking at nearly a grand for a Xeon rig.
Whether you liked or hated AMD FX I think we all owe a big “THANK YOU!!!” to AMD for going beyond 4c/4t because if Intel had its way you still wouldn’t get anything more than 4 cores for less than 4 figures. Try doing anything on an i5 from that period and man it hurts, between the lack of threads and the security patch slowdowns? Ouch man, very ouch.
bill,
That sums up the Bulldozer experience. It was a good chip for Linux and server type workloads, but it was hot and low-clocked.
Running an FX8350 in a Linux workstation was really pretty nice.
bassbeast,
Indeed. The 15h cores were pretty flawed, but were great for the money with the correct workload!
At home I have a mighty beloved FX.-8350, conservative overclocked @ 4300 mhz, rgb gaming case coupled with an RX570, windows 22h2 super debloated, moved to my main TV set, it’s my “playstation” and still it’s a more than decent gaming system.
Sure, watt 4 IPC it’s not a bargain, but I use it only for casual gaming, with today energy prices only a fool can think to work 24/7 with such a machine.
As bassbeast pointed out, also my impression is that Intel chips from the same era have paid a much bigger “security patches tax”, and also recent windows version have optimized the use of the Bulldozer architecture, not well supported in the first years…
I too have a FX-8350 in my main-PC
But I have upgraded it from a HD 7850 to a RX 5700 XT GPU.
With respect to how much milage I got out of this good-for-nothing CPU, it was a much better deal, than anything intel had to offer in that pricerange back then.
I gave my FX-6300 and FX-8320 to the grandkids when I upgraded to Ryzen and with an RX 580 and RX6600 respectively? Still play 1080P games just fine and IIRC I paid like $79 USD for the 6300 and a whopping $99 for the 8320 at a time when an i3 was nearly $120 USD so I’d say the bang for the buck was firmly in AMD’s corner.
I mean can you imagine trying to game on an i3 3220 in 2023? Yet the old AMD FX chips are happily playing ARMA and World Of Warships even as I type this with both kids getting over 60FPS at 1080P. for the money I paid for those chips? I really cannot complain..
I don’t get the Bulldozer hate. The FX 8150 was the chip that replaced my Phenom X4 and it was an absolute beast. I could throw anything at it and nothing made it seize up. Transcoding video, while copying files and having multiple browser windows active with some downloads. The desktop was always responsive. I loved that thing. Don’t know how it performed under Windows though, as I am a linux user.
Honestly, I think they were fine. But in a lot of cases cheaper intel parts were killing the AMD equivalent on desktop. AMD simply wasn’t competitive on laptops. I had many friends that bought those, and regretted it instantly. They were buying new near budget level laptops that were terribly slow from day 1. Windows was a part of the problem along with junkware on their by default, but the comparable intel could manage just fine. If someone brought me a “slow computer” they wanted sped up, it was never an intel.
Yeah, I also built a few dirt cheap boxes with low-end bulldozer derivatives because of budget constraints. To make them tolerable required really optimizing windows, getting rid of all the trash such as the antivirus, and using nothing newer than XP+office 2003. There was an update enabling office 2003 open .docx files. That was a life-saver.
Yeah, those low-end buldoozeys were ‘better’ than celerons, but not by a huge margin. And I never ventured into the 220+ watt species. Nope. That won’t work.
IDK, I had better experience with Celerons.
People were right. FX chips may not have been competitive then, but they have aged better than you would have expected them to, now that more (most?) programs use more cores. Especially games. I saw someone demonstrate an overclocked FX8350 running lots of modern games perfectly fine.
However for purely single-thread programs they always were and always will be terrible. Thank goodness Ryzen fixed that, and gave Intel a reason to start moving again.
Ryzen is still relatively poor at single-threaded applications. The gap is smaller, but AFAIK Intel still has a lead.
However, a LOT of tasks nowadays aren’t very single-thread heavy. And ones that are are usually legacy apps that ran like dog crap on hardware contemporary to their release anyway. Ryzen are killing it now because of the multithreaded nature of most modern tasks.
The123king,
Yep. It takes lots of transistors and long speculative superscalar pipelines, but Intel has done well at maximizing the performance of a thread. Consumers appreciate performance but it comes with engineering tradeoffs like high power, increasing the risk of data leakage (ie spectre/meltdown). There are always tradeoffs.
On the one hand parallelism is increasing in consumer software, but IMHO uptake has been very slow and parallelism is still woefully underused. I find many applications are still limited by single threaded bottlenecks. For example when I use graphics applications including inkscape and gimp I’ll ask myself why something is so slow…oh yeah one core is pegged at 100% while other cores are idle. These are such obvious candidates for SMP, I’m guessing the dev teams might not have the resources to make it happen.
This is a problem with many game loops too. Sometimes multiple threads are assigned to different subsystems like input/audio and whatnot. It’s easy to see why, these subsystems are loosely coupled and it’s trivial to spawn new threads for them. That’s fine but these days a lot of the trivial parallelism like this represents 0% of the load, so it doesn’t really help. To take advantage of high core counts one really must break down the heavy subsystems into more threads, which is harder to do and many games still are not doing.
So obviously it depends on the exact applications one is running, but I think typical consumers will get very marginal benefit after 4 cores. I concede my experience may not be representative, but unless I’m running a benchmark, blender, or a parallel compiler job, it’s rare to see applications that run my 8C/16T at full load.
The C programming language hasn’t be engineered to be multi-threaded. Almost all language are doing it wrong and/or is a pain to implement right. Even async/await isn’t the silver bullet of it.
Stop Writing Dead Programs : https://www.youtube.com/watch?v=8Ab3ArE8W3s
The Mess We’re In : https://www.youtube.com/watch?v=lKXe3HUG2l4
Kochise,
I’ve seen the first video before. I need to find time to watch the 2nd one, but I do agree that legacy programming languages are holding us back. Actually even older languages like lisp have benefits over C, although IMHO lisp’s benefits can be overshadowed by bad syntax (for humans). I haven’t used them much outside of language research, but I find prolog and haskell to have interesting features that developers should consider embracing more of.
Multithreading is difficult because humans are not good at consistently applying synchronization rules as complexity increases. Computers on the other hand are excellent at verifying rules reliably given programming languages that support this like rust, but it’s really tough to make a dent in the widespread deployment of legacy languages. I have a feeling most old-school developers are simply never going to embrace new languages, rather they’re going to have to get phased in with new generations of developers. And even then there’s pressure to use legacy languages the industry is deeply invested in.
Alfman,
I think Fortran has some comments on making parallelism a first-class part of the language.
More cores will help the background processes get scheduled. More high-perf cores beyond 4x probably don’t make sense for most people, but efficiency cores probably help.
I’m interested in seeing how big.LITTLE chips can be used once OS schedulers are adjusted. Scheduler enhancements were one of the things which helped Bulldozer. The devs figured out loading up one core from each module before going back to the second core gave them a performance bump, and the strategy worked for Intel as well because SMT isn’t 100% of a real core.
Eventually the OS will probably have processor tiers, and the program can provide hints about which tier the process should get.
Then there is being able to partition the hardware with lots of cores. It would be interesting to see if a many core machine could be cut up in to security zones. Insecure code runs on zone 1, zone 0 is OS, and zone 2 is trusted applications. This would have more applications in a server context then a laptop of desktop.
Flatland_Spider,
IMHO the best use case for E cores is low load & background/idle processes to keep the P cores from having to wake up.
Obviously optimizing the scheduler is important, but the question is optimizing it for what? Because on any given hardware the scheduler has to contend with an implicit tradeoff. For performance, you want as much as possible running on the P cores whereas for efficiency you want as much as possible to run on E cores. Then there’s the added caveat that if NUMA is involved, adding more cores can actually be very unproductive for applications with heavy reliance on memory sharing & synchronization (so much so that AMD provided a “game mode” that improves performance by reducing cores). While I believe scheduling has since improved to detect such problems on the fly, we still face diminishing returns with high cores.
All the while I was reading this I’m thinking “this is exactly what a computer cluster is already.”. I think ultimately scalaing will have to take the form of computer clusters in one form or other because of shared memory’s inherent bottlenecks.
Operating systems need to evolve past a single kernel managing many cores; clusters need to become first class primitives. One OS could automatically distribute and schedule processes across hundreds of computers. Adding and migrating processes across nodes would be completely transparent.
The main risk for things like spectre has always been time sharing units that hold lots of state, which can reflect privileged data. This is pretty easy to defend against by simply clearing the state information and randomizing task switches between privilege domains. The main challenge is to be secure while NOT clearing state information for performance reasons, and unfortunately this is where super-scalar architectures keep getting into trouble.
I don’t see a need for CPUs to physical isolate processes to zones as long as we are in fact clearing the state and randomizing task switch intervals between security domains. On the other hand this may be easier said than done. Even if processes themselves are truly fully isolated, the kernel structures using to service syscalls generally are not, which can leak information even across privilege domains & CPU cores. IMHO it would be very tough to rectify in today’s kernels. It could be another benefit for microkernels over monolithic.
Power/heat because x86 is super power hungry, and AMD/Intel can’t seem to do anything to bring it down (or want to). Laptops are the dominant form factor, and they are thermally constrained. Desktops are getting thermally constrained too now.
Alternately, latency. Humans are latency sensitive.
True. If there are enough cores, they can be kept warm for whatever application is involved.
I thought “Game Mode” was to get the proc into turbo clocks and keep the scheduler from firing up the other cores? Those max frequencies on Intel and AMD chips are only for a single core. The entire chip can’t run at turbo frequencies; they don’t have the thermals for it.
I don’t think we have enough high core count CPUs to really determine that on the consumer side.
There is the how consumer stuff is bottlenecked by the human sitting in the chair. People can’t multitask enough to keep the cores busy. That cannot be fixed, but I am interested in seeing what happens if everyone has a 128x core machine on their lap.
Yeah, exactly. Miniature clusters everywhere!
They do, and a lot of what I’m envisioning only makes sense with something like a microkernel. There’s probably some asymmetric multiprocessing in there as well to make it happen.
That would be really cool to see that.
This is what I was thinking about. What if we only have to have a few insecure processes get hit with the performance penalty?
It’s not really feasible with 4-8 cores, but above that, isolating processes might work.
I have no idea how feasible any of this is in silicon, so I’m just making up ideas.
Wait, I’ve seen something like this before. The Intel Xeon Phi inside a server. Linux could run on the Phi cores, since they were small, independent Atom procs. That’s one source of the ideas.
The other is a really weird NT server where the disk was shared between multiple motherboards. The independent OSes would work off of the single shared drive. It was the oddest thing.
Flatland_Spider,
For better or worse lots of users still value performance over efficiency. The shrinking node sizes has helped, but instead of reducing power they keep adding more cores and gigahertz, haha. I agree with you about bad thermal headroom with laptops in particular, but then laptops probably don’t need such a high end CPU. It’s not all bad, most x86 cpus are adding efficiency cores. Also a mid range 13th gen CPU compares favorably to a top of the line 10th gen CPU…
cpubenchmark.net/cpu.php?cpu=Intel+Core+i5-13600K&id=5008
cpubenchmark.net/cpu.php?cpu=Intel+Core+i9-10900K+%40+3.70GHz&id=3730
It’s when you opt for high end CPUs that power consumption is crazy high, but some users need (or think they need) that many cores, and this inevitably takes more power. :-/
As I understand it, the turbo performance is completely dynamic and there’s no need to disable cores to boost because idle cores are enough. Hypothetically if the motive for disabling cores were to maximizing turbo performance, then it would have made more sense to disable neighboring cores while keeping active cores on different dies for the best heat distribution. However this isn’t what AMD’s game mode did, it actually disabled one die completely (probably at the expense of sub-optimal heat transfer from the remaining die).
https://www.anandtech.com/show/11726/retesting-amd-ryzen-threadrippers-game-mode-halving-cores-for-more-performance
I believe the main reason this helps games is because they were able to eliminate NUMA overhead between dies.
I tested this some years ago and I’d say it’s already been a bottleneck for some time because a few cores can saturate bandwidth quite easily. Ideally critical code loops would all reside in cache and algorithms would not be overly dependent on IO & shared memory structures. But getting rid of bottlenecks, particularly around NUMA, creates complexity that most developers just don’t want to deal with.
I think that could work. maybe even with no silicon changes…? An OS could put security constraints around CPU affinity to isolate certain cores to security zones and then limit the processes that could request them.
Sounds interesting, let me know if you find a link
I guess in principal, as long as each kernel has the other kernel’s memory marked as reserved, they could operate independently of each other with very few changes. Shared peripherals would break down though. It’s an odd idea indeed!