Monday, May 24, 2010

How Hardware Virtualization Works (Part 1)


Zilch. Nada. Nothing. Rien.

That's the best approximation to the intrinsic overhead for computer hardware virtualization, with the most modern hardware and adequate resources. Judging from comments and discussions I've seen, there are many people who don't understand this. It is possible to find many explanations of hardware virtualization all over the Internet and, of course, in computer science courses. Apparently, though, they don't stick, or aren't approachable enough. So I'll try to explain in this multi-part series of posts how this trick is pulled off.

This discussion is actually a paper that has been published as a single piece in the proceedings of CloudViews – Cloud Computing Conference 2009, the 2nd Cloud Computing International Conference, held may 20-21 in Porto, Portugal. I was planning to attend and discuss it as a talk, but unfortunately other things intervened and I could not attend.

Before talking about how hardware virtualization works, let's put it in context with cloud computing and other forms of virtualization.

Virtualization and Cloud Computing

Virtualization is not a mathematical prerequisite for cloud computing; there are cloud providers who do serve up whole physical servers on demand. However, it is very common, for two reasons:

First, it is an economic requirement. Cloud installations without virtualization are like corporate IT shops prior to virtualization; there, the average utilization of commodity and RISC/UNIX servers is about 12%. (While this seems insanely low, there is a lot of data supporting that number.) If a cloud provider could only hope for 12% utilization at best, when all servers were used, the provider will have to charge a price above that of competitors who use virtualization. That can be a valid business model, and has advantages (like somewhat greater consistency of in what's provided) and customers who prefer it, but the majority of vendors have opted for the lower-price route.

Second, it is a management requirement. One of the key things virtualization does is reduce a running computer system to a big bag of bits, which can then be treated like any other bag o' bits. Examples: It can be filed, or archived; it can be restarted after being filed or archived; it can be moved to a different physical machine; and it can be used as a template to make clones, additional instances of the same running system, thus directly supporting one of the key features of cloud computing: elasticity, expansion on demand.

Notice that I claimed the above advantages for virtualization in general, not just the hardware virtualization that creates a virtual computer. Virtual computers, or "virtual machines," are used by Amazon AWS and other providers of Infrastructure as a Service (IaaS); they lease you your own complete virtual computers, on which you can load and run essentially anything you want.

In contrast, systems like Google App Engine and Microsoft Azure provide you with complete, isolated, virtual programming platform – a Platform as a Service (PaaS). This removes some of the pain of use, like licensing, configuring and maintaining your own copy of an operating system, possibly a database system, and so on. However, it restricts you to using their platform, with their choice of programming languages and services.

In addition, there are virtualization technologies that target a point intermediate between IaaS and PaaS, such as the containers implemented in Oracle Solaris, or the WPARs of IBM AIX. These provide independent virtual copies of the operating system within one actual instantiation of the operating system.

The advantages of virtualization apply to all the variations discussed above. And if you feel like stretching your brain, imagine using all of them at the same time. It's perfectly possible: .NET running within a container running on a virtual machine.

Here, however, I will only be discussing hardware virtualization, the implementation of virtual machines as done by VMware and many others. Also, within that area, I am only going to touch lightly on virtualization of input/output functions, primarily to keep this article a reasonable length.

So, on we go to the techniques used to virtualize processors and memory. See the next post, part 2 of this series. (Link to be added when that is posted.)


PC said...
This comment has been removed by the author.
Anonymous said...

Nice blog. Virtualization + elasticity + automation are the core aspects of cloud computing. This is a good introduction to virtualization in the context of cloud computing. I look forward to the next part in the post.

bane said...

I'm not disputing that hardware virtualisation has virtually no cost. However, I wonder how much of the low usage on conventional machines is due to the way OS's have been designed for historical usage (ie, licencing mechanisms that are assume they are whole machine rather than per user group, difficulty in checkpointing/migrating a subset of the OS, lack of rigorous QoS at OS level, etc)? The vague reason for wondering about this is that I don't want to administer every aspect of an OS, what I want is to be able to select a different "component" to the standard configuration in those few cases where it matters to me.

This is orthogonal to however hardware may facilitate things.

Irfan said...

Good Article.
Looking forward to Part 2.

Anonymous said...

Like your blog! New to virtualization. Could you explain how it is that a virtualized machine is more efficient in using the processing power of the microprocessor if there is no parallelism and the operating system or hypervisor can still only send one instruction at a time? I guess I am trying to figure out how a virtualized environment is able to utilize more capacity of the processor than a non-virtualized processor. Thanks!

Greg Pfister said...


To answer your question:

One virtual machine will not better utilize a single processor (or system).

The idea is that you consolidate several virtual machines onto one physical machine.

Typical commercial server utilization is around 12%, so 4 virtual servers, for example, will together use around 48% of a physical machine (plus small virtualization overhead).

This works even if each of those has no parallelism in the app or the OS. The parallelism is across the multiple virtual machines that were consolidated together on one physical system.

Anonymous said...

Perhaps one of the things that's not entirely clear here to those without a lot of sysadmin experience is that in larger installations servers are typically partitioned as much or more based on role (in relation to various administrative [both engineering and administrative] and security concerns) as they are server load.

"A new project gets its own server" is a pretty typical procedure in many organizations, for example, even if a dozen of the smaller projects could be run off just one server. This can be due to ownership and budget issues, compartmentalization between projects, or many other reasons.

I personally run five hosts on my home Internet connection, since it helps greatly in dealing with security issues. (I don't have to worry about having other shell users on my mail server, for example, and I can split applications based on whether they have a greater need to come up unattended or be resident on an encrypted filesystem.) Being able to do this on one box instead of five saves me a considerable amount in hardware and power costs.

Raj said...

Hi Greg,

Is there some study done from a I/O throughput or Network latency perspective on a standard set of applications running in multiple servers as against running in multiple guest operating systems on a VM ? Can you please share some details on this ?

Greg Pfister said...

Hi, Raj.

The pickings are quite lean, I'm afraid. I know of no standard benchmarks, and in fact very few public measurements at all at this point. Mostly there are statements in white papers about "near native" performance for SR-IOV (see last part of the sequence of posts). This may well be true, and I think it certainly can be, but the proof's not there.

But here are a couple of links to one-off trials that show about 2X performance between SR-IOV on vs. off:

And here's one with more cases, but since it's about IBM System i SAN volume controllers, it's pretty much a niche:

If anybody else knows of others, I'd be obliged for links, myself!

Thanks for commenting,


Anonymous said...

I've done a few rough performance checks of disk and network I/O from time to time on some of my virtual machines. Xen disk I/O and KVM's network and disk virtio run at something close enough to native speed that it's not been worth checking it further. Surprisingly (to me, anyway), even KVM's emulated IDE disk I/O was very fast. (Virtio for disks under KVM is unfortunately not yet stable, and can lead to data loss--see the Ubuntu and KVM bug databases for details.) Note that my tests were going through a lot of OS code (e.g., filesystems and so on), and so that overhead was quite possibly drowning out any differences.

However: I suspect that in most uses of virtualization, performance makes very little difference, so long as it's not truly awful. In the corporate world, at least, servers tend to be quite under-utilized.

If you are running an application pushing a server to its limits, you might just as well use native mode anyway, since moving more stuff on to an already fully-loaded server doesn't make much sense.

Greg Pfister said...

Good to hear about your experiences, Curt.

I'd point to two cases where virtual IO efficiency does matter, though --

- Clown computing. That's where you pack so many virtual machines in a single physical server that... you get the idea. People will do that.

- Database servers on the back end. You might do that native, but:
(a) doing it virtual will present a more uniform management interface, given that everything else is likely virtual;
(b) it may be convenient to be able to use VMotion, moving the VM to another machine, to avoid planned outages (one of the most frequent kinds of outage).


Anonymous said...

Ah, yes; I'd thought about wanting to use a single VM on a host just for the management side of things, but couldn't think offhand of any particular management things that would be helpful.

As for trying to get as many hosts as possible on a server (i.e., to optimize as much as possible your resource utilisation), well, "clown" computing is a good term for it. There are two issues there, of course. The first is that the extra savings in going from twenty machines to three machines instead of twenty machines to four machines is a pretty minimal saving. The second is that I've dealt with situations where you're trying to run "near the edge" of maximum performance and doing that is usually rather insane because in most cases the failure modes are catastrophic rather than gradual. As you know, it's like delaying one train by a minute on a busy subway: everything behind it backs up, and recovery can take hours. (The classic situation I've seen is having a service run so close to maximum memory utilization that a burst in the offered load drives the host into swap, often "wedging" the host for tens of minutes.)

Greg Pfister said...


FYI, Intel just published a comparison of native & virtual servers running SQL stress tests in the most recent issue of their "Intel Software Dispatch" mag. Here's the URL I end up at. Not sure if it will work for you, but you can Google the mag:

Greg Pfister said...

And, speak of the devil, we now have SPECvirt_sc2010. It takes an interesting approach.

Anonymous said...

The Intel comparison looks as if it probably involved very little disk I/O. The machines had 128 GB of RAM, and the database size was only 27 GB.

server virtualization said...

My name is Matt and I work for Dell. There are a lot of great comments happening on this blog. Thank you so much for the information.

Post a Comment

Thanks for commenting!

Note: Only a member of this blog may post a comment.