Friday, June 4, 2010

How Hardware Virtualization Works (Part 4)

This is the fourth and last in a series of posts about how hardware virtualization works. Catch it from Part 1 to understand the context.

Drown It in Silicon

In the previous discussion I might have lead you to believe that paravirtualization is widely used in mainframes (IBM zSeries and clones). Sorry. It is used, but in many cases another technique is used, alone or in combination with paravirtualization.

Consider the example of reading the real time clock. All that has to happen is that a silly little offset is added. It is perfectly possible to build hardware that adds an offset all by itself, without any "help" from software. So that's what they did. (See figure below.)

They embedded nearly the whole shooting match directly into silicon. This implies that the bag 'o bits I've been glibly referring to becomes part of the hardware architecture: Now it's hardware that has to reach in and know where the clock offset resides. Not everything is as trivial as adding an offset, of course; what happens with the memory mapping gets, to me anyway, a tad scary in its complexity. But, of course, it can be made to work.
Nobody else is willing to invest a pound or so of silicon into doing this. Yet.

As Moore's Law keeps providing us with more and more transistors, perhaps at some point the industry will tire of providing even more cores, and spend some of those transistors on something that might actually be immediately usable.

A Bit About Input and Output

One reason for all this mainframe talk is that it provides an existence proof: Mainframes have been virtualizing IO basically forever, allowing different virtual machines to think they completely own their own IO devices when in fact they're shared. And, of course, it is strongly supported in yet more hardware. A virtual machine can issue an IO operation, have it directed to its address for an IO device (which may not be the "real" address), get the operation performed, and receive a completion interrupt, or an error, all without involving a hypervisor, at full hardware efficiency. So it can be done.

But until very recently, it could not be readily done with PCI and PCIe (PCI Express) IO. Both the IO interface and the IO devices need hardware support for this to work. As a result, IO operations have for commodity and RISC systems been done interpretively, by the hypervisor. This obviously increases overhead significantly. Paravirtualization can clearly help here: Just ask the hypervisor to go do the IO directly.

However, even with paravirtualization this requires the hypervisor to have its own IO driver set, separate from that of the guest operating systems. This is a redundancy that adds significant bulk to a hypervisor and isn't as reliable as one would like, for the simple reason that no IO driver is ever as reliable as one would like. And reliability is very strongly desired in a hypervisor. Errors within it can bring down all the guest systems running under them.

Another thing that can help is direct assignment of devices to guest systems. This gives a guest virtual machine sole ownership of a physical device. Together with hardware support that maps and isolates IO addresses, so a virtual machine can only access the devices it owns, this provides full speed operation using the guest operating system drivers, with no hypervisor involvement. However, it means you do need dedicated devices for each virtual machine, something that clearly inhibits scaling: Imagine 15 virtual servers, all wanting their own physical network card. This support is also not an industry standard. What we want is some way for a single device to act like multiple virtual devices.

Enter the PCI SIG. It has recently released a collection – yes, a collection – of specifications to deal with this issue. I'm not going to attempt to cover them all here. The net effect, however, is that they allow industry-standard creation of IO devices with internal logic that makes them appear as if they are several, separate, "virtual" devices (the SR-IOV and MR-IOV specifications); and add features supporting that concept, such as multiple different IO addresses for each device.

A key point here is that this requires support by the IO device vendors. It cannot be done just by a purveyor of servers and server chipsets. So its adoption will be gated by how soon those vendors roll this technology out, how good a job they do, and how much of a premium they choose to charge for it. I am not especially sanguine about this. We have done too good a job beating a low cost mantra into too many IO vendors for them to be ready to jump on anything like this, which increases cost without directly improving their marketing numbers (GBs stored, bandwidth, etc.).


There is a joke, or a deep truth, expressed by the computer pioneer David Wheeler, co-inventor of the subroutine, as "All problems in computer science can be solved by another level of indirection."

Virtualization is not going to prove that false. It is effectively a layer of indirection or abstraction added between physical hardware and the systems running on it. By providing that layer, virtualization enables a collection of benefits that were recognized long ago, benefits that are now being exploited by cloud computing. In fact, virtualization is so often embedded in cloud computing discussions that many have argued, vehemently, that without virtualization you do not have cloud computing. As explained previously, I don't agree with that statement, especially when "virtualization" is used to mean "hardware virtualization," as it usually is.

However, there is no denying that the technology of virtualization makes cloud computing tremendously more economic and manageable.

Virtualization is not magic. It is not even all that complicated in its essence. (Of course its details, like the details of nearly anything, can be mind-boggling.) And despite what might first appear to be the case, it is also efficient; resources are not wasted by using it. There is still a hole to plug in IO virtualization, but solutions there are developing gradually if not necessarily expeditiously.

There are many other aspects of this topic that have not been touched on here, such as where the hypervisor actually resides (on the bare metal? Inside an operating system?), the role virtualization can play when migrating between hardware architectures, and the deep relationship that can, and will, exist between virtualization and security. But hopefully this discussion has provided enough background to enable some of you to cut through the marketing hype and the thicket of details that usually accompany most discussions of this topic. Good luck.

No comments:

Post a Comment

Thanks for commenting!

Note: Only a member of this blog may post a comment.