Monday, December 6, 2010

The Varieties of Virtualization

There appear to be many people for whom the term virtualization exclusively means the implementation of virtual machines à la VMware's products, Microsoft's Hyper-V, and so on. That's certainly a very important and common case, enough so that I covered various ways to do it in a separate series of posts; but it's scarcely the only form of virtualization in use.

There's a hint that this is so in the gaggle of other situations where the word virtualization is used, such as desktop virtualization, application virtualization, user virtualization (I like that one; I wonder what it's like to be a virtual user), and, of course, Java Virtual Machine (JVM). Talking about the latter as a true case of virtualization may cause some head-scratching; I think most people consign it to a different plane of existence than things like VMware.

This turns out not to be the case. They're not only all in the same (boringly mundane) plane, they relate to one another hierarchically. I see five levels to that hierarchy right now, anyway; I wouldn't claim this is the last word.

A key to understanding this is to adopt an appropriate definition of virtualization. Mine is that virtualization is the creation of isolated, idealized platforms on which computing services are provided. Anything providing that, whether it's hardware, software, or a mixture, is virtualization. The adjectives in front of "platform" could have qualifiers: Maybe it's not quite idealized in all cases, and isolation is never total. But lack of qualification is the intent.

Most types of virtualization allow hosting of several platforms on one physical or software resource, but that's not part of my definition because it's not universal; it could be just one, or a single platform could be created spanning multiple physical resources. It's also necessary to not always dwell all that heavily on boundaries between hardware and software. But that's starting to get ahead of the discussion. Let's go through the levels, starting at the bottom.

I'll relate this to the cloud computing's IaaS/PaaS/SaaS levels later.

Level 1: Hardware Partitioning

Some hardware is designed like a brick of chocolate that can be broken apart along various predefined fault lines, each piece a fully functional computer. Sun Microsystems (Oracle, now) famously did this with its .com workhorse, the Enterprise 10000 (UE10000). That system had multiple boards plugged into a memory-bus backplane, each board with processor(s), memory, and IO. Firmware let you set registers allowing or disallowing inter-board memory traffic, cache coherence and IO traffic, allowing you to create partitions of the whole machine built with any number of whole boards. The register setting, etc., is set up so that no code running on any of the processors can alter it or, usually, even tell it's there; a privileged console accesses them, under command of an operator, and that's it. HP, IBM and others have provided similar capabilities in large systems, often with the processors, memory, and IO in separate units, numbers of each assigned to different partitions.

Hardware partitioning has the big advantage that even hardware failures (for the most part) simply cannot propagate among partitions. With appropriate electrical design, you can even power-cycle one partition without affecting others. Software failures are of course also totally isolated within partitions (as long as one isn't performing a service for another, but that issue is on another plane of abstraction).

The big negative of hardware partitioning is that you usually cannot have very many of them. Even a single chip now contains multiple processors, so partitioning even by separate chips is far less granularity than is generally desirable. In fact, it's common to assign just a fraction of one CPU, and that can't be done without bending the notion of a hardware-isolated, power-cycle-able partition to the breaking point. In addition, there is always some hardware in common across the partition. For example, power supplies are usually shared, and whatever interconnects all the parts is shared; failure of that shared hardware cause all partitions to fail. (For more complete high availability, you need multiple completely separate physical computers, not under the same sprinkler head, preferably located on different tectonic plates, etc. depending on your personal level of paranoia.)

Despite its negatives, hardware partitioning is fairly simple to implement, useful, and still used. It or something like it, I speculate, is effectively what will be used for initial "virtualization" of GPUs when that starts appearing.

Level 2: Virtual Machines

This is the level of VMware and its kissin' cousins. All the hardware is shared en masse, and a special layer of software, a hypervisor, creates the illusion of multiple completely separate hardware platforms. Each runs its own copy of an operating system and any applications above that, and (ideally) none even knows that the others exist. I've previously written about how this trick can be performed without degrading performance to any significant degree, so won't go into it here.

The good news here is that you can create as many virtual machines as you like, independent of the number of physical processors and other physical resources – at least until you run out of resources. The hypervisor usually contains a scheduler that time-slices among processors, so sub-processor allocation is available. With the right hardware, IO can also fractionally allocated (again, see my prior posts).

The bad news is that you generally get much less hardware fault isolation than with hardware partitioning; if the hardware croaks, well, it's one basket and those eggs are scrambled. Very sophisticated hypervisors can help with that when there is appropriate hardware support (mainframe customers do get something for their money). In addition, and this is certainly obvious after it's stated: If you put N virtual machines on one physical machine, you are now faced with all the management pain of managing all N copies of the operating system and its applications.

This is the level often used in so-called desktop virtualization. In that paradigm, individuals don't own hardware, their own PC. Instead, they "own" a block of bits back on a server farm that happens to be the description of a virtual machine, and can request that their virtual machine be run from whatever terminal device happens to be handy. It might actually run back on the server, or might run on a local machine after downloading. Many users absolutely loathe this; they want to own and control their own hardware. Administrators like it, a lot, since it lets them own, and control, the hardware.

Level 3: Containers

This level was, as far as I know, originally developed by Sun Microsystems (Oracle), so I'll use their name for it: Containers. IBM (in AIX) and probably others also provide it, under different names.

With containers, you have one copy of the operating system code, but it provides environments, containers, which act like separate copies of the OS. In Unix/Linux terms, each container has its own file system root (including IO), process tree, shared segment naming space, and so on. So applications run as if they were running on their own copy of the operating system – but they are actually sharing one copy of the OS code, with common but separate OS data structures, etc.; this provides significant resource sharing that helps the efficiency of this level.

This is quite useful if you have applications or middleware that were written under the assumption that they were going to run on their own separate server, and as a result, for example, all use the same name for a temporary file. Were they run on the same OS, they would clobber each other in the common /tmp directory; in separate containers, they each have their own /tmp. More such applications exist than one would like to believe; the most quoted case is the Apache web server, but my information on that may be out of date and it may have been changed by now. Or not, since I'm not sure what the motivation to change would be.

I suspect container technology was originally developed in the Full Moon cluster single-system-image project, which needs similar capabilities. See my much earlier post about single-system-image if you want more information on such things.

In addition, there's just one real operating system to manage in this case, so management headaches are somewhat lessened. You do have to manage all those containers, so it isn't an N:1 advantage, but I've heard customers say this is a significant management savings.

A perhaps less obvious example of containerization is the multiuser BASIC systems that flooded the computer education system several decades back. There was one copy of the BASIC interpreter, run on a small minicomputer and used simultaneously by many students, each of whom had their own logon ID and wrote their own code. And each of whom could botch things up for everybody else with the wrong code that soaked up the CPU. (This happened regularly in the "computer lab" I supervised for a while.) I locate this in the container level rather than higher in the stack because the BASIC interpreter really was the OS: It ran on the bare metal, with no supervisor code below it.

Of course, fault isolation at this level is even less than in the prior cases. Now if the OS crashes, all the containers go down. (Or if the wrong thing is done in BASIC…) In comparison, an OS crash in a virtual machine is isolated to that virtual machine.

Level 4: Software Virtual Machines

We've reached the JVM level. It's also the .NET level, the Lisp level, the now more usual BASIC level, and even the CICS (and so on): the level of more-or-less programming-language based independent computing environments. Obviously, multiple of these can be run as applications under a single operating system image, each providing a separate environment for the execution of applications. At least this can be done in theory, and in many cases in practice; some environments were implemented as if they owned the computer they run on.

What you get out of this is, of course, a more standard programming environment that can be portable – run on multiple computer architectures – as well as extensions to a machine environment that provide services simplifying application development. Those extensions are usually the key reason this level is used. There's also a bit of fault tolerance, since if one of those dies of a fault in its support or application code, it need not always affect others, assuming a competent operating system implementation.

Fault isolation at this level is mostly software only; if one JVM (say) crashes, or the code running on it crashes, it usually doesn't affect others. Sophisticated hardware / firmware / OS can inject the ability to keep many of the software VMs up if a failure occurred that only affected one of them. (Mainframe again.)

Level 5: Multitenant / Multiuser Environment

Many applications allow multiple users to log in, all to the same application, with their own profiles, data collections, etc. They are legion. Examples include web-based email, Facebook,, Worlds of Warcraft, and so on. Each user sees his or her own data, and thinks he / she is doing things isolated from others except at those points where interaction is expected. They see their own virtual system – a very specific, particularized system running just one application, but a system apparently isolated from all others in any event.

The advantages here? Well, people pay to use them (or put up with advertising to use them). Aside from that, there is potentially massive sharing of resources, and, concomitantly, care must be taken in the software and system architecture to avoid massive sharing of faults.

All Together Now

Yes. You can have all of these levels of virtualization active simultaneously in one system: A hardware partition running a hypervisor creating a virtual machine that hosts an operating system with containers that each run several programming environments executing multi-user applications.

It's possible. There may be circumstances where it appears warranted. I don't think I'd want to manage it, myself. Imagining a performance tuning on a 5-layer virtualization cake makes me shudder. I once had a television system that had two volume controls in series: A cable set-top box had its volume control, feeding an audio system with its own. Just those two levels drove me nuts until I hit upon a setting of one of them that let the other, alone, span the range I wanted.

Virtualization and Cloud Computing

These levels relate to the usual IaaS/PaaS/SaaS (Infrastructure / Platform / Software as a Service) distinctions discussed in cloud computing circles, but are at a finer granularity than those.

IaaS relates to the bottom two layers: hardware partitioning and virtual machines. Those two levels, particularly virtual machines, make it possible to serve up raw computing infrastructure (machines) in a way that can utilize the underlying hardware far more efficiently than handing customers whole computers that they aren't going to use 100% of the time. As I've pointed out elsewhere, it is not a logical necessity that a cloud use this or some other form of virtualization; but in many situations, it is an economic necessity.

Software virtual machines are what PaaS serves up. There's a fairly close correspondence between the two concepts.

SaaS is, of course, a Multiuser environment. It may, however, be delivered by using software virtual machines under it.

Containers are a mix of IaaS and PaaS. It's doesn't provide pure hardware, but a plain OS is made available, and that can certainly be considered a software platform. It is, however, a fairly barren environment compared with what software virtual machines provide..


This post has been brought to you by my poor head, which aches every time I encounter yet another discussion over whether and how various forms of cloud computing do or do not use virtualization. Hopefully it may help clear up some of that confusion.

Oh, yes, and the obvious conclusion: There's more than one kind of virtualization, out there, folks.


Paul A. Clayton said...

According to BSD jails preceded Sun's Container mechanism.

It seems it should be possible for the shared part of the OS of container-style virtualization to be rather isolated. With Itanium's page group protections, it would even be possible to guarantee significant isolation of components.

An OS itself tends to present a virtualized environment to the applications and the users (virtual memory, time sharing, etc.).

DanielVS said...

Hi Greg, I remembered one of your previous posts about virtualization and the wrong idea that virtualization means less performance just when I was reading a nice article at Phoronix:

Notice the C-Ray and COMP benchmarks. That's exactly what you said: (well written) kernels that don't (unecessary implement) need restricted instructions should run as good as on bare metal, well, because they are actually there.

Nice post btw!


Greg Pfister said...

Thanks for the pointer to that article, Paul. Impressive table. Glad I don't have to keep it current.

Yes, it does show BSD Jails prior to Sun Containers, so I was wrong to say Sun invented it. Of course if you throw chroot into the mix, as that article does, it's the real "first" (they say 1982). Chroot's pretty weak, so that's debatable.

Isolation - sure, it's quite possible to get increased isolation with increased work. But OSs are big and complicated, and even the best do crash. Hypervisors are much smaller, hence easier to get "right." So I'd still say OS-level virtualization tends to be is less fault isolated than lower levels -- but this depends a lot on the implementation, and may wash out in some circumstances.

Curt Sampson said...

Containers are not always easier to manage than hypervisor-level virtualization. Sure, you have only one kernel, but that's a disadvantage when you want to run different versions or flavours of the OS as you now have to deal with, e.g., using the kernel version for release 6 even when you want to run release 7 on one VM.

Greg Pfister said...

Hi, Curt.

Yes, I agree. Years ago, when IBM had no container-like things and Sun (RIP) did, I used to say just that to customers in the briefing center in Austin.

They smiled politely and went off to buy Sun gear. They "knew" that having fewer copies of the OS to manage was better. And they probably assumed we said that because we didn't have it.

This happened often enough that IBM did its version on AIX. After that, we said the same thing but added that some customers disagree, so we did it to let them have the choice -- and make the statement more believable, actually.


us vpn said...

Thank you for sharing this in-depth guide on virtualization.

Robin Rizvi said...

Thanks for writing such good articles

Post a Comment

Thanks for commenting!

Note: Only a member of this blog may post a comment.