The End of Proprietary Big Iron ISA?

OSAlert recently linked to several articles on the two remaining big iron RISC based platforms still alive and kicking, something of great interest to myself both in a professional capacity and for personal reasons (I wouldn’t be an avid OSAlert reader and poster if I wasn’t into non mainstream architectures).

Introduction

During a thread in which several of us where discussing the new POWER7 architecture, user Tyler Durden (I like the name by the way), somebody with vastly more knowledge of RISC processors than I, pointed out the following:

The development of a high end processor and its associated fab technology will soon reach the billion dollar mark.

$1 billion! Ouch!

Basically what this means is that, as these systems are niche market, high end RISC based systems are coming to a point where it’s no longer possible to sell enough systems to recoup the cost of developing them. This, for me, is going to cause countless issues on a professional level and if you’ve ever worked with one of these systems in a mission critical environment, you’ll know exactly what I’m talking about.

Why care?

You would most certainly be forgiven for asking what’s so special about these systems anyway? Well, for me, it comes down to several issues. As both the hardware and (usually) the OS of these systems are developed and/or certified by the same company, including device drivers, there is a tight integration that only the Mac can provide in the consumer space and although this in itself is nothing special, it does allow for enterprise level features you can’t get in anywhere else. One such feature is the ability to identify a faulty device, be that an expansion card, DIMM or even a CPU, power off that specific device or port in which it’s plugged, open the system and replace the faulty component without powering down the system, rebooting or even entering single user mode. Obviously these features are exceptionally handy when dealing with high-availability applications and services but to me they both pale in comparison to their virtualization capabilities.

Current Intel and AMD processors support virtualization features, using VT-x and AMD-V respectively, but these features are only for dealing with X86 CPU virtualization, an ISA where it has been notoriously difficult to implement full virtualization. What systems like POWER and SPARC have been using for quite some time is a system called I/O virtualization, also known as IOMMU.

IOMMU: A brief introduction.

IOMMUs basically perform the same function as a CPU MMUs, in that they map virtual memory address ranges to physical address ranges and though they can be used for several other functions, such as providing memory protection in the case of faulty devices, their main use is to provide hardware I/O virtualization capabilities. Devices on the PCIe bus do almost all of their communication with other devices via main memory, a process know as DMA. If a hypervisor where to allow a guest OS direct access to a PCIe device’s memory space, this could possibly overwrite a) the data from any other guest OS that has access to this device, causing corruption, and more importantly b) data from the host system, almost certainly taking the system down. Currently, on X86 platforms, this is avoided by using either dedicated devices or
paravitualization, a technique that uses software to emulate a device which is then presented to the guest OS. If you’ve ever created a virtual harddrive file for use with VirtualBox, QEMU or VMWare you are basically using paravirtualization to present the semblance of a harddrive to the guest OS.

The problem with paravirtualization is that as it adds a translation layer between the hardware and the OS, it incurs a performance penalty. This is why up until recently, 3d acceleration for guest operating systems was infeasible. With hardware I/O virtualization, this performance penalty is drastically reduced as the only added step is the translation from virtual to physical memory address range that the IOMMU performs, allowing you to allocate the same device to multiple guest operating systems as though they where running directly on the hardware.

For many devices, this isn’t actually a problem as they are already “virtualized”, for example RAID arrays in which you allocate a LUN to a specific host, but what if you have a multiport network adapter or HBA where you wish to map a specific port to a specific guest OS? You can buy multiple adapters but this approach is limited by the amount of expansion slots available.

What is often done with X86 systems in these situations is that these ports are bound together, paravirtualized and allocated to however many guest operating systems using a process called multiplexing. Usually, communication is then subsequently VLAN or VSAN tagged. This is far from perfect as not only do you incur the performance penalty of paravirtualization while also reducing flexibility, there is the added costs due to needing advanced infrastructure to support these setups. Although VLAN tagging is common as muck these days, VSAN tagging is still a relatively new technology and last I heard, only implemented on top of the range CISCO fibre directors.

So what am I going to do when it becomes too expensive to produce my beloved high-end RISC systems? Am I going to have to deal with inelegant systems to continue to provide my customers with virtualization? Thankfully, AMD and Intel have both come up with solutions.

Enter AMD-Vi and VT-d.

Not long after Mr. Durden made the above comment which prompted me to start thinking about this situation, Peter Bright published a guide to I/O virtualization on Ars Technica in which he describes the new IOMMUs being developed for the X86 platform, AMD’s AMD-Vi and Intel’s VT-d. Basically these systems provide what has been provided by the likes of POWER and SPARC based systems for years now, the ability to present a guest OS with hardware I/O virtualization. Not only will this allow big-iron UNIX vendors to produce high-end systems using commodity hardware, it will allow one to one hardware access to graphics cards, allowing guest operating systems to take full advantage of the latest 3d capabilities. As long as the guest OS has the required drivers, you can now ditch that dual boot system and startup a virtual machine for all your gaming needs.

Some of you may have noticed there is a slight, to say the least, problem with this kind of setup. If hardware virtualization is one to one, aren’t we back in the same memory range corruption boat as before? What if two guest operating systems attempt to access the same device at the same time? To avoid any problems that could arise from one to one hardware access yet still retain multiplexing capabilities, devices on the PCIe bus need to present the same set of functions multiple times. Although this sounds like a lot of work, something like it has already been implemented. Currently, PCIe devices already support multiple functions but so far this has been used to support different hardware capabilities. With the adoption of IOMMUs by both AMD and Intel, it will be possible to develop devices that are virtualization aware, thereby presenting several guest operating systems with exactly the same set of functions.

…and finally.

Although it could be some time yet before we see the demise of high-end custom ISA, the writing is on the wall and although I’m going to miss these systems, if only for their novelty value, I’m going to leave you with another quote from Mr. Durden:

…the processor is not the main value proposition for their systems. The actual system, and its integration is.

Amen to that, brother!

For further reading on virutalization, check out the following links:

About the author:
I’m a systems engineer and administrator mostly dealing with UNIX and Linux systems for the financial sector.

49 Comments

  1. 2010-02-18 11:14 pm
    • 2010-02-19 12:14 am
      • 2010-02-19 6:07 pm
        • 2010-02-19 6:10 pm
          • 2010-02-19 6:32 pm
          • 2010-02-21 6:30 am
          • 2010-02-21 6:22 pm
          • 2010-02-21 6:43 pm
          • 2010-02-22 5:13 pm
          • 2010-02-23 6:38 pm
        • 2010-02-21 6:30 am
  2. 2010-02-18 11:14 pm
    • 2010-02-19 8:57 am
      • 2010-02-19 9:57 am
      • 2010-02-19 6:59 pm
        • 2010-02-20 3:43 pm
          • 2010-02-23 2:41 am
  3. 2010-02-18 11:26 pm
    • 2010-02-20 3:51 pm
    • 2010-02-21 5:17 pm
  4. 2010-02-18 11:54 pm
    • 2010-02-19 6:14 pm
      • 2010-02-20 9:00 pm
        • 2010-02-22 5:47 pm
  5. 2010-02-19 12:27 am
    • 2010-02-19 8:59 am
      • 2010-02-19 1:41 pm
        • 2010-02-19 4:18 pm
        • 2010-02-19 5:29 pm
          • 2010-02-19 5:49 pm
  6. 2010-02-19 12:28 am
  7. 2010-02-19 1:00 am
    • 2010-02-19 1:24 am
    • 2010-02-19 8:56 am
      • 2010-02-19 5:43 pm
  8. 2010-02-19 2:11 am
    • 2010-02-19 4:08 pm
      • 2010-02-19 5:36 pm
    • 2010-02-21 9:06 pm
  9. 2010-02-19 2:24 am
    • 2010-02-19 9:27 pm
  10. 2010-02-19 2:38 pm
    • 2010-02-19 4:20 pm
  11. 2010-02-19 5:20 pm
  12. 2010-02-19 11:22 pm
  13. 2010-02-20 12:15 am
  14. 2010-02-21 6:04 pm