Linux Virtualization – A Sysadmin’s Survey of the Scene
The VMware Question
I’ve done quite a bit of work with VMware ESX the last few years and even though I have some serious purist-concerns about the management toolset I have to admit that it’s the product to beat in the enterprise these days. VMware is currently the state of the art for managability and beats the pants off most everyone else out there right now.
Yes, you can run 10-50 virtual machines without any fancy toolset to to manage them. I’ve done as much at home just using the venerable qemu and nothing more than shell scripts. However, I’ve also worked in environments with thousands of virtualized hosts. When things get this scaled up, you need some tools with some serious horsepower to keep things running smooth. Questions like “what host is that VM on?” and “What guests were running on that physical server when it crashed?” and “How can we shut down all the VMs during our maintenance window?” become harder and harder to answer without a good management toolset. So, before we continue dig the perspective I’m coming from. Picture 4000 guests running on 150 beefy physical hosts connected to 2-3 SANs across 3 data centers. This is not all that uncommon anymore. There are plenty of hosting companies that are even larger environments. Right now they pretty much all run VMware and that’s not going to change until someone can get at least close to their level of manageability.
The Vmware Gold Standard
Well first let’s talk about what Vmware get’s right so we can see where everyone else falls.Vmware’s marketing-speak is so thick and they change the names of their products so often that it’ll make your head spin. So, keep in mind. I’m just going to keep it real and speak from the sysadmin point of view not the CIO Magazine angle. Here’s what makes Vmware so attractive from my point of view:
- Live Migration (vMotion)
- Live storage migration (vStorageMotion or whatever they call it)
- Good SAN integration with lots of HBA drivers and very easy to add datastores from the SAN
- Easy to setup multipathing for various SAN types
- The “shares” concept for slicing CPU time on the physical box down to priority levels on the guests (think mainframe domains)
- Easy to administer one machine (vSphere client) or many machines in a cluster (vCenter)
- Distributed resource management (DRS) allows you to move busy VMs off to balance the load
- Supports high availability clustering for guests (HA) and log-shipping-disk-and-memory-writes fault tolerant clones (FT). The latter is something I don’t think anyone else does just yet.
- Allows you to over-commit both memory (using page sharing via their vmware-tools guest additions) and disk (using “thin provisioning”)
- Allows easy and integrated access to guest consoles across a large number of clustered machines (vCenter)
- Allows easy “evacuation” of hosts. Guests can spread themselves over the other nodes or all migrate to other hosts without a lot of administrative fuss. This allows you to do hardware maintenance on a host machine without taking downtime for the guests.
- Customers in hosted environments can get access to a web-based console for just their host allowing them to reboot or view stats without getting support involved.
- Some nice performance statistics are gathered both per-host and per-guest.
- VMware is very efficient. In my testing only about 1-3% of the host’s CPU is degraded by the hypervisor. The rest the VMs do actually get. In some rare cases, they can even perform better (like cases where short bursty I/O allows the dual buffer caches of the OS and hypervisor to help out).
Why I Want to See them Fail (Vmware)
Want reasons besides the fact that they are a big evil corporation making very expensive non-free proprietary software with very expensive support contracts ? Well let’s see:
- They refuse to make a Linux native or open-source client (vSphere client). It also doesn’t work at all with Wine (in fact WineHQ rates it GARBAGE and I agree). Want to see the console of a guest in Linux – forget it. The closest you can get is run it inside Windows in a desktop virtualization app like VirtualBox or Vmware workstation for Linux. I’ve also done it via SeamlessRDP to a “real” Windows server. Don’t even leave a comment saying you can use the web-console to view guest consoles. You can’t, period. The web console has about 40% of the functionality of the fat client (and it’s not the most used functionality for big environments) and is good for turning guests on and off. That’s about it. The web console has a LONG way to go. If they do beef it up, I’m afraid they’ll use Java or ActiveX to make it slow and clumsy.
- They are removing the Linux service console. Yes you can still get a Redhat-a-like service console for ESX 4.0 but not for ESXi 4.0. Also, they are planning to move all future releases toward a “bare metal hypervisor” (aka ESXi) in the future. Say goodbye to the service console and hello to what they call “vMA”. The latter is a not-as-cool pre-packaged VM appliance that remotely runs commands on your ESXi boxes. Did you like “vmware-cmd” and “esxcfg-mpio” well now you can federate your ESXi servers to this appliance and run the same tools from there against all the servers in your environment. The only problem is that the vMA kind of sucks and includes a lot of kludgy Perl scripts, not to mention is missing things that you might want like being able to do or script up directly on the host machine (it’s not a 100% functional replacement for ESX). The bottom line is that it’s not as Unixy anymore. They are moving toward a sort of domain-specific operating system (ESXi). I know I’m not the only one who will miss the ESX version when they can it. I’m friends with a couple of ex-VMware support folks who told me that they hated getting called on ESXi because it tied their hands. Customers never even knew about the VMa and frequently they had to wait while the clueless MCSE fumbled through putting together the vMA and wasted time that could have been spent troubleshooting if they’d been using ESX.
Redhat’s Latest – RHEV
Redhat has been making noise lately about it’s RHEL based virtualization product offerings. I’ve been wondering lately when they’d add something to the mix that would compete with VMware’s vCenter. I really hoped they’d do it right. The story so far was that, in order to manage a large cluster of virtual machine host servers remotely from your sysadmin’s workstation you needed to VNC or XDMCP to the box and run Virtman or you could use command line tools. Anyone who has seen the level of consolidation and configuration options that vCenter offers to VMware admins would choke, roll their eyes, and/or laugh at those options. I’m a self-confessed unix bigot and even I know that “option” is a joke. Virtman is extremely limited and can only manage one server at a time.
Okay, so enter RHEV. Ready for this – the console runs on Windows only. Seriously! So you put up with a much less mature virtualization platform and you get stuck with Windows to manage it anyhow. I’ve never ran it with thousands of machines, but even with a few it was buggy, exhibiting interface lockups and showing about 60% of what vCenter can do. So, the only real advantage of having a true Unix-like platform to run on gets basically nullified by Redhat by pulling this stunt. Do us all a favor Redhat, sell your KVM development unit to someone with a clue. KVM has some real potential, but gets lost in the suck of RHEV.
XenServer – Now 0wn3d by Citrix!
Well I had high hopes for Xen back in their “college days” before they got scooped up by Citrix. Now it’s a bizarre hybrid of an RPM-based distro (though they claim to be moving to a Debian base), a monstrous web-application platform (which isn’t all bad), and a whole lot of abstraction from the metal. My experience with their platform is about a year old and I wasn’t at all impressed. The web GUI had several serious issues like losing track of registered VMs when moving them around. It also had a lot brain-damaged Java-programmerish crap under the hood. I’m talking about tons of XML files to track VM configuration, naming, and location. Very little was traditional I-can-read-it-and-edit-it-just-fine-without-an-XML-viewer text files or key-value-pairs (ala an INI file). This and the fact that the virtual hard disks are big unreadable hashed names made popping the hood on XenServer a real mess.
Xen – In the buff on SuSE and Wrappered On Oracle VM Server
Well, SuSE 11 was the last time I played with Xen “in the raw”. Novell would like to sell you this thing call “Orchestrator” to try to give you something more than just a Virtman interface to manage your Xen guests. I watched a demo by the Novell folks for Orchestrator and was not at all impressed. Half the functionality was something they said you’d basically have to script yourself. Well, news-flash Novell, I don’t need Orchestrator to write scripts to manage Xen. It may have changed since I last saw it, but IMHO as a long-time old-school sysadmin it added very little value.
So you want to try to script the management of Xen yourself? Well, it can be done. The problem is that almost all the CLI Xen tools are scripts themselves and are prone to hanging when the going gets tough. I had a fairly large Xen environment for a while and had a ton of problems with having to restart ‘xend’ to get the CLI tools unstuck. When they get stuck and you have an crond-enabled scripts depending on them you tend to get a train-wreck or at best a non-functional script. It’s also very easy to step on your virtual machines using shared storage. There isn’t any locking mechanism that prevents you from starting the guest on two separate box using NFS or a clustered filesystem on a SAN. You have to use scripts and lock-files to overcome this. If you don’t you end up with badly corrupted guests. Additionally, the qcow2 format was very badly behaved when I last used Xen. Crashing SuSE 11 virtual servers resulted in more than a few corrupt qcow images. I had one that was a sparsefile claiming to be 110TB on a 2TB LUN.
What about OVM? Well if you want Oracle support I guess you could brave it. I tried it once and found it to be awful. Not only does it have some complicated three-tier setup, it’s also unstable as heck. I had it crash several times before I gave up and looked elsewhere. The GUI is web-based but it’s about as intuitive as a broken Rubik’s cube. You can download it for free after signing away your life to the Oracle Network. I didn’t spend much time on it after the first few terrible impressions I got.
Xen has potential, but until the CLI tools are more reliable it’s not worth it. The whole rig is a big hassle. That was my opinion about a year ago, anyhow.
Virtualbox – Now 0wn3d by Oracle!
Well, no it’s not an enterprise VM server. If they went down that path, it’d compete with OVM which is their absolutely horrible Xen-based offering. However, I would like to say that VirtualBox has a few really good things going for it.
- It’s fast and friendly. The management interface is just as good as VMware workstation, IMHO.
- It does support live migration, though most folks don’t know that.
- It has a few projects like VboxWeb that might really bear fruit in large environments.
- It doesn’t use stupid naming conventions for it’s hard disk images. It names them the same as the machine they got created for.
- There are a decent set of CLI tools that come with it.
I have some real serious doubts about where Oracle will allow the product to go. It’s also half-heartedly open source. They keep the RDP and USB functionality away from you unless you buy it. For a workstation application, it’s pretty darn good. For an enterprise virtualization platform it might be even better than Xen, but nowhere near Vmware.
KVM and QEMU
Let’s keep this short and sweet. Fabrice Bellard is a genius and his work on Qemu (and the hacked up version for Linux called KVM) is outstanding. As a standalone Unix-friendly virtualization tool it’s impressive in it’s performance and flexibility. However, outside of RHEV (which is currently awful) there aren’t any enterprise tools to manage large numbers of Qemu or KVM boxes. I also haven’t really seen any way to do what VMware and XenServer can do with “shares” of CPU and memory between multiple machines. There is duplicate-page sharing now (KSM), but that’s a long way from the huge feature set in VMware. I have the most hope for the Qemu family but I really wish there was some great Unix-friendly open source management tools out there for it outside the spattering of immature web-based single-maintainer efforts.
Proxmox VE the Small Dark Knight
There is Proxmox VE, which is a real underdog with serious potential. It supports accelerated KVM and also has operating system virtualization in the form of OpenVZ. Both are manageable via it’s web interface. It also has clustering and a built-in snapshot backup capability (and the latter in a form that VMware doesn’t offer out of the box). It does lack a lot of features that VMware has such as VMware’s “fault tolerance” (log shipping), DRS (for load balancing), Site Recover Manager (a DR toolsuite), and the whole “shares” thing. However, considering it’s based on free-as-in-free open-source software and it works darm good for what it does do, I’d say it’s got great potential. I’ve introduced it to a lot of folks who were surprised at how robust it was after trying it.
Virtuozzo and OpenVZ
Virtuozzo is operating system virtualiztion. It’s limited in that you can only host like-on-like virtual machines. Linux machines can only host Linux and Windows can only host Windows. I have tried it out quite a bit in lab environments but never production. I was impressed with how large of consolidation ratios you can get. If you have small machines that don’t do much you can pack a TON of them on one physical box. It also has a terrific web-GUI that works great in open-source browsers. It has an impressive level of resource management and sharing capabilities, too. It offers a much better individual VM management web-interface than VMware (by far). It also has a lot of chargeback features for hosting companies (their primary users).
I have a friend who worked for a VERY large hosting company and used it on a daily basis. His anecdotes were not as rosy. He told me that folks were often able to crash not only their own box but the physical host server, too. This caused him major painful outages. I didn’t like hearing that one bit. However, I have definitely seen VMware and even Qemu crash hard. I’ve seen VMware crash and corrupt it’s VMs once, too (in the 3.x days). That was painful. However, I wouldn’t take such stories lightly about Virtuozzo. Another negative was their pricing. The folks at Parallels were quite proud of the product and the pricing wasn’t much better than for VMware. You’d think they’d want the business *shrug*.
There’s a nice panoply of choices out there now but nobody is really giving VMware a run for their money outside of niche areas like OS virtualization. I’d love to see something like Proxmox take off and give VMware some headaches. I’d also like to see much higher levels of Unix friendly focus from the big boys. We aren’t all MCSEs and lobotomy recipients out here in sysadmin land and a few decent Unix tools on the CLI and native-GUI front would be well received from the non-VMware players. I know it’s about market share, but it doesn’t excuse moves like the Windows-only management for RHEV stunt (*disgusted*). Here’s hoping the future for the free, open, and clueful VM platforms is brighter.