Metal. If the name sounds hardcore, it’s because it’s a hardcore improvement to the way games will be able to perform on iOS 8. Metal represents a much more no-nonsense approach to getting the most out of the Apple A7’s gaming performance, assuring users of the iPhone 5S, iPad Air and iPad mini with Retina display that their devices will continue to be top-notch game systems come this fall.
Right now in iOS 7 software called OpenGL ES sits in between the game and the core hardware that runs it, translating function calls into graphics commands that are sent to the hardware. It’s a lot of overhead. And iOS 8 is getting rid of a lot of it.
A nice overview of Apple’s Metal.
This is certainly interesting, but it confuses me. Is the suggestion that Apple is intending to create a new API like DirectX? This would be a shame, since it fragments the 3D graphics space even further.
GPU compute is already covered by OpenCL, and OpenGL supports precompiled shaders with glShaderBinary (admittedly not on iOS, but that’s an argument for implementing it, rather than creating a separate API).
After reading the article, I’m still confused about what Metal really is. What’s the difference between an OpenGL shader and a Metal shader?
It’s not ideal. But it’s necessary. When standards don’t move fast enough, organizations ditch the standard and do it themselves. Eventually, once the innovation is done and consensus is reached, standardization happens again.
It’s not a good thing that OpenGL was not good enough for Apple, AMD, and Google. It IS good that they’re doing something about it. The leap-frog game continues.
Thanks for the reply. I’m trying to track down the Google API you refer to, but can’t seem to find it (I did try – honestly! – but all I could come up with was ANGLE, which looks to be something different). Do you have any links to point me in the right direction?
http://android-developers.blogspot.ca/2014/06/developer-preview-and…
http://android-developers.blogspot.com/2014/06/google-io-and-games….
Thanks for the links; that’s really helpful. I guess I shouldn’t really pass judgement without having used Metal, but it’s good to see Google are invested in the OpenGL extension mechanism, rather than developing an entirely separate API.
It is not necessary. Look at all of the work that Steam has done to improve the performance of the graphics stack in Linux using OpenGL. And Apple’s gains are likely temporary anyway. Many of the current performance gains are likely because Metal probably doesn’t have as many features as OpenGL. Once Metal starts getting more features, the performance will likely degrade and then they’ll be in a similar boat to OpenGL. At that point, all Apple did was create a new platform incompatible with the others and create further lock-in of their developers.
Unless, of course, the company decides to keep the platform closed to prevent others from using it. Which scenario sounds more like Apple as of late?
The summary is quite clear: its goal is to have less overhead than OpenGL-ES.
I agree that part was clear, but it wasn’t at all clear to me from the article – which is rather high level – how it was intending to achieve it. Given existing shader languages are compiled down and executed directly on the GPU, after which OpenGL needn’t have much involvement (depending on the task), I was curious as to what sort of overhead they were talking about.
After reading burnttoys’s really helpful comment below [1] I’ve subsequently had a look at Apple’s developer library [2,3], which to my mind gives a much clearer picture of what Apple are doing with this.
[1] http://www.osnews.com/permalink?591493
[2] https://developer.apple.com/library/prerelease/ios/documentation/Mis…
[3] https://developer.apple.com/library/prerelease/ios/documentation/Met…
I’m completely illiterate when it comes to graphics and games, so when I hear all this buzz about new graphics APIs like Metal and Mantle, I can’t help to ask myself how this works.
I believed (probably wrongly, it seems), that for a graphics API you first and foremost needed the GPU (the hardware) to support certain features, and then the drivers to enable the usage of those features. For example, when they release a new GPU (Nvidia, AMD, etc…) and they say that it supports DirectX 11 and OpenGL 4.2. Does this just mean that the drivers for that new GPU support those APIs? Is it unrelated to the hardware itself???
If so, my old Gen 4 Intel IGP that supports OpenGL 2.1 (I use Linux), could potentially support OpenGL 4.2 if someone actually cared to implement it in the drivers???
And if not, if hardware support for those features is needed (like a CPU that supports SSE4, etc…), how can Apple announce a new graphics API without partnering with GPU manufacturers to provide those features that Metal should offer?
Sorry if it’s a completely silly question, but it really makes me wonder and I couldn’t find any good answer to it by just searching.
Apple *is* the GPU manufacturer for iOS devices.
Hhmmm, did you check that? I read everywhere (including the article above) that they use a PowerVR G6430 by Imagination Technologies. That’s “off the shelf” hardware, even if Apple might order the manufacturing to Samsung or whoever.
Actually the new Intel Moorfield platform uses that same GPU (though manufactured by Intel itself, I believe).
In any case, does your answer mean that yes, the graphics API needs hardware support in the first place? And that this GPU announced in June 2012 already supported the Metal API?
Hi, I’ll try and chip in a bit. I have some understanding in this area (I’ve worked for 3DLabs, ZiiLabs, Imagination Technologies, Samsung and worked with Qualcomm and ARM all on GPU and GPU driver technology as well as some combined HW/SW systems from “back-in-the-day” based on 3Dfx VSA100/200 and Ti 32xxx DSPs).
Firstly, you’re spot on with the SSE4 analogy – if the GPU actually lacks a certain feature then it would be VERY hard to support that concept efficiently (to the point where it might not be worth it). For example, OpenGL ES 3.0 fully supports floating point textures. Much older hardware simply can’t do this (there may be work-arounds).
However, over the last few years – and especially now (Qualcomm 3xx, Imgtec 5 and 6 series, ARM Mali 6xx) the GPU isn’t so much a GPU as a massively multi-threaded compute engine that also has some nice graphics features such as Z-Buffering, alpha blending and really nice texture look-up unit. Having scanned the Metal spec it seems that all of these have been kept as “first-class-concepts” in Metal – probably because they are fantastic optimization targets.
The REALLY big win from this seems to be the OpenCLish command buffer concept. OpenGL ES has a command buffer but you have no real control. You just use programs, set parameters, load buffers and that’s it. I _think_ what Apple are offering up here is quite useful especially from a multi-core perspective.
Firstly it seems that damn near everything is deferred, that is work goes into command buffers and gets done when you “swap/flush/execute” (different API’s – different parlance).
Secondly you can have multiple command buffers issuing work to the GPU – this REALLY helps multi-threading and multi-core programming which is EXTREMELY awkward using GLES. This should help GUI programming nicely too – that is you can parallelise the drawing of “widgets” and the like.
Remember – Apple aren’t offering more fill-rate, texture through-put or vertex shading abilities (although in modern designs fragment and vertex programming tend to use the same ALUs but will have different output formats and optimization strategies).
Apple are offering better control. No longer will “useProgram” take an eternity – because that command is added to a queue and will processed along with many others. The state changes the app requires can be minimized 2 ways – firstly there is much more fine-grain control over the commands the GPU will execute. Secondly – you can have multiple command buffers for threading or so that you can set up several drawing commands without the restriction of only a single “program” being in use within the GL driver at any given time. GLES is highly serial and often synchronous. Indeed it is very hard to make GLES purely async.
Other optimisations come from shared buffer accesses instead of continual uploads – something barely standard in GLES (although present – esp for vertex arrays).
Hope this makes some sense – written in a hurry – I should be doing my accounts!
Thanks for that very detailed answer!
So it seems that modern GPUs are quite more “generic” than they used to be, so they give more flexibility for the way software can use them. That’s why Metal can actually exist without a specific hardware design for it.
It sounded ironical to me when you explained this because I remember somewhere around 2008 (?) that Intel had some “project Larrabee” (that never was realized) that had this very same principle, and Nvidia was laughing a these efforts saying it looked like a GPU designed by a CPU team (or something similar). And in the end the GPU market has gone exactly that way for allowing GPGPU and things like Mantle and Metal.
Technically Apple does have a hardware design under it. Low level APIs are tethered to the underlying hardware to a greater degree than generic OGL and D3D.
Apple has a stake in ImgTec so it seems likely that for the foreseeable future they only have to tailor Metal for that particular architcture.
Larrabee? Kinda turned in to Xeon Phi via Intel’s Terascale plan.
There was some interesting papers about using _very_ wide SIMD engines for graphics.
OpenGL and D3D have a lot of overhead in their own abstractions.
Basically, you spend your time in those APIs putting data into formats that those APIs require, then the platform drivers convert that into something the GPU can use.
Then shaders (which also have to be compiled and messed with by drivers) operate on that data in batches. Because GPUs are massively parallel, they operate on a bunch of data at once, but you can’t do 3 things to half the data and another three things to another set of data, etc. You have to do one kind of operation for all the data in the current buffer, then switch to a new buffer and do another type of operation. 3D engines will often do all the flat surfaces in one pass, switch the mode (shaders, etc.) then render all the shiny surfaces, etc.
The drivers do their best to optimize, but OpenGL and D3D are platform agnostic – not hardware specific APIs. This means the abstractions they deal with don’t match the hardware directly, which causes overhead. Those drivers are also filled with a lot of ancient cruft that’s hard to change without breaking things (this is more true for AMD). This means that developers have to jump through hoops to optimize (or fix bugs) on specific hardware – although it mostly means is they won’t bother.
AMD’s Mantle and Apple’s Metal are ways to open the lower level of the actual GPU hardware to developers so they can better optimize their own work, and avoid the overhead of a generic API, freeing the CPU which normally would have to do a bunch of extra translation work to do other things. It should also open the door for some new kinds of operations that aren’t feasible now – which is far more exciting than just the performance boost.
nVidia on the other hand often will optimize developers’ games themselves and ship those optimizations in their own drivers to override whatever shipped in a particular game. They are so secretive I wonder if they’ll ever release a similar low level API.
I’d guess the low level API coming for D3D has a very very long lead time because of how difficult it will be to create a low level API that works on many different GPU platforms (not just nVidia and AMD, but all the mobile GPUs). I wouldn’t hold my breath for an OpenGL equivalent any time soon.
BTW, it’s fun to watch the cycles here – we had early proprietary 3D hardware “accelerators” like 3DFX, with their own custom APIs like Glide, and then everyone clamored for a standard (OpenGL and D3D). And now here we are 20 years cheering the return of proprietary APIs.
Proprietary APIs never went away. That consolidation only happened on desktop systems.
Not speaking for this GPU specifically (But, I’m going to assume it applies to this), but, API changes in OpenGL and DirectX frequently require new hardware changes to support things like new texture compression formats, new texture formats, or floating point numbers.
Sometimes, features can be implemented in the driver – tessellation can be implemented in software and cover some use cases without a huge performance hit, but this isn’t always the case.
I see. So a graphics API does require hardware support, but the hardware support for this features is a rather generic thing and not something specifically tied to that API. And Metal makes use on the hardware features already available in current hardware, but just uses them in a different way. That makes sense, thanks.
Disclaimer: I worked with PowerVR and other vendor’s hardware directly and understand well how it works.
The idea that OpenGL as an abstraction layer gets in the way and degrades performance is for the most part bullshit.
The rationale is that the CPU sends too many commands so it becomes a bottleneck, because there is too much API code between the game an the hardware.
There is a serious problem with this reasoning, but only people with a lot of experience in this area can really understand it. Even most render engineers and many hardware designers don’t get it and it’s really worrying.
The bottom line is, these APIs (Mantle, DirectX12, Metal, etc) will fail miserably because the cost/benefi balance of using them is very negative. They are non-portable, require more knowledge to learn, more time to write for and, in the end, will simply not benefit 99.9999999% of the games out there. They are only really useful for large AAA games and the performance increment of using them is very marginal at best.
The reasoning is simple. The problem that in APIs such as OpenGL the CPU is the bottleneck is indeed true, but this is a hardware limitation more than an API limitation and no matter what you do and how low level the access is, the performance increments are always marginal because, physically, the GPU is thousands of times faster anyway by design and the CPU will bottleneck no matter what. There is no way around this unless the hardware design of GPUs changes and more coarse logic can run on chip.
To make it short, all this is bullshit.
Here’s some more techncial debunking. API claims it can do draw commands 10x faster.
But draw commands are very low on the list of bottlenecks, bottlenecks are usually in this order:
1) fragment program performance
2) cache misses due to state changes
3) post processing stuff (antialias, full screen effecets, etc)
4) vertex program processing (much lower)
5) draw commands (much much lower).
The last one is also the easiest to optimize, in fact, no matter how fast the GPU is, you don’t want your game doing hundreds of thousands of draw commands, so you will use either instancing or batching.
The only situations where you can’t do instancing or batching is if you have hundreds of thousands of dynamic objects (that move around and change) and each looks different to the other. Can you think of any game where this is the case? I can’t either.
What I’ve been reading up on Mantle, the various benchmarks I’ve seen and all it doesn’t really improve things in the best-case scenario by much more than 1% or so, but what it does is it improves the worst-case scenario by quite a large margin. All you commenters here seem to be quite focused on the best-case, but IMHO, addressing the worst-case should come first. I have no idea if Metal does that, but at least AMD’s Mantle offers some actual benefit over current APIs.
Sadly one of those worst case scenarios is the hierarchical GUI. It involves huge numbers of program changes (often several times per widget).
Unlike the systems described by reduz up above fragment shader performance is rarely any kind of bottle-neck. At most there would be 10->20 cycles of processing – often WAY less as the GPUs elemental features (blending, texture filtering) are “free” (in the sense of parallel).
The usual bottleneck for mobile GUIs is bandwidth, bandwidth, bandwidth followed by draw call time.
I think the multiple command queues can help out with draw call time. I’m sure that Apple’s GPU/memory subsystem are more than adequate.
Yes but that depends on your definition of what is a worst case scenario. As I mentioned before, in practice, a worst case scenario is a game that does hundreds of thousands of draw calls and state changes per frame.
Only something like that can become CPU bound. OpenGL ES 3.0+ and DirectX already have APIs to deal with this:
-Instancing (one draw call instead of thousands)
-Uniform Buffer Blocks (one call to setup a material instead of dozens)
-Vertex Array Objects (one call to setup buffers instead of another few dozens)
Plus you can also do offline batching (merge many objects into one.
The history of OpenGL and DirectX was pretty much moving stuff to the hardware side year after year to avoid being CPU bound.
At this point there is nothing else to move except the more coarse rendering logic, (which would avoid round trip latency issues and being cpu bound) but that would require a completely new hardware design and APIs that would make OpenGL or DirectX obsolete.
That’s too heavy to take the risk, so naturally it^A's cheaper to advertise more “to the metal” solutions meanwhile.
Looking at Mantle benchmarks, suggest the benefits are when games are cpu limited. Among weaker cpus the perfomance gains *are* fairly significant.
The best performance gains are achieved when the cpu is unable to submit enough draw calls. None of these apis are there to target gpu bound applications.
These low level apis matter to consoles which are using multiple netbook (weak) cores. But they can benefit those that have weaker cpus in a pc or mobile device. With regards to the A8 it all depends on how much the cpu is bottlenecking the gpu.
It does however seem a bit silly, unlocking all that performance out of a mobile gpu is a recipe for a really short battery life.
Edited 2014-06-28 16:55 UTC
I don’t understand your reasoning. If the API consumes less resources than another API then it leads to longer battery-life, not shorter.
I sort of depends. For example an unbottlenecked gpu will scale in its power consumption to a far greater degree; it will eat whatever power is thrown at it. By allowing more draw calls you’re basically allowing the gpu to have (more) free reign.
The question here is whether power used by the gpu is offset by the power saved by the cpu (I suspect it isn’t). Given that they’re all advertising more draw calls, it stands to reason developers will use up any spare computational power of the gpu.
This is all a bit wishy washy and we’re not even considering thermal throttling (and power gating) which occurs in modern mobile devices. Heat produced by these SOCs probably bounds performance to temperature rather than the full capabilities of the gpu.
Metal could be the “Apple’s DirectX”. Great move.
That is not particularly a good move. DirectX usage is dropping dramatically due to lack of portability.
This API is much harder to use than OpenGL, only works on Apple and only 0.000001% of developers will have any benefits using it.
I guess all those XBox and Windows developers might be a mirage then.
Maybe they are migrating to OpenGL with its source code incompatible versions across desktop, mobile and web.
Or maybe they are using LibCGM, GNM, GNMX, GX, GX2 on the other game consoles.
I think the previous poster was referring to the fact that both console and PC game growth is either flat (consoles: -1% YoY growth*) or declining (PC: -8% YoY growth*). Whereas the mobile segment is the one growing explosively (+38% YoY growth*).
In the mobile segment, DirectX is pretty much irrelevant. So in a sense the previous commenter is correct, DirectX is either stagnant or dropping.
* 2013 figures.
Exactly how is this a great move for anyone other than Apple?
Game developers wanting to maximize performance.
No sane studio writes straight to 3D APIs without a game engine abstraction.
Vendor lock-in isn’t good for anyone but the vendor.
This argument can be used in reversal. If you are using a game engine, you will not be able to optimize your specific game for the low level API, which makes the API even less irrelevant.
Though to a certain degree all gpus are merging into fairly similar SIMD architectures, meaning a certain level of commonality between each of these apis. A game engine today already has to deal with several apis. More add to the burden but maintaining different code paths isn’t new.
Eg Frostbyte which deals with GCN, GNM, d3d (variants) and mantle.
Edited 2014-06-29 01:30 UTC
[double post]
Edited 2014-06-29 01:29 UTC
Game engines use higher level abstractions.
You don’t think in points, triangles, texture coordinates and matrix operations.
Rather in entities, materials, render passages, geometry and so on.
So the engines have optimized code paths translating those higher level descriptions into the optimal way to use each API.
Yeah, that’s an interesting question.
I think maybe this is “software evolution” (although unlike naturally evolution there’s a lot more horizontal exchanging of genes).
There are several attempts occurring to redefine GPU interfaces to be more efficient and allow clearer access to parallel compute.
Maybe, just like before – these interface will eventually become similar enough, by absorbing the best characteristics of their rivals, that they will unify again – probably under the OpenGL banner.
Or, if they are operating and targeting different environmental factors, really diverge into 2 non-competing infrastructures.
You are right, great move for Apple, not for us (at least not in a direct way).
BTW I think this could start a new trend in mobile game development. Take full advantage of hardware resources is a good thing and maybe others companies begin to do it too.
The cost of abstraction? If you want an API that works in a variety of GPUs, each one using it’s own low-level data formats, shader language or whatever, you ‘ve got to go high-level and take the performance penalty.
All those vendor-specific formats are a disaster waiting to happen. Each GPU vendor will publish it’s own API and try to convince developers to use it exclusively or exclusively for the high-quality version of the game.
Also, it might hurt Android, if game devs keep having iOS as a first priority.
PS: Can you pre-compile shaders in OpenGL for popular GPU brands? That would remove some part of the overhead.