Linux Virtualization – A Sysadmin’s Survey of the Scene

The VMware Question

I’ve done quite a bit of work with VMware ESX the last few years and even though I have some serious purist-concerns about the management toolset I have to admit that it’s the product to beat in the enterprise these days. VMware is currently the state of the art for managability and beats the pants off most everyone else out there right now.

Yes, you can run 10-50 virtual machines without any fancy toolset to to manage them. I’ve done as much at home just using the venerable qemu and nothing more than shell scripts. However, I’ve also worked in environments with thousands of virtualized hosts. When things get this scaled up, you need some tools with some serious horsepower to keep things running smooth. Questions like “what host is that VM on?” and “What guests were running on that physical server when it crashed?” and “How can we shut down all the VMs during our maintenance window?” become harder and harder to answer without a good management toolset. So, before we continue dig the perspective I’m coming from. Picture 4000 guests running on 150 beefy physical hosts connected to 2-3 SANs across 3 data centers. This is not all that uncommon anymore. There are plenty of hosting companies that are even larger environments. Right now they pretty much all run VMware and that’s not going to change until someone can get at least close to their level of manageability.

The Vmware Gold Standard

Well first let’s talk about what Vmware get’s right so we can see where everyone else falls.Vmware’s marketing-speak is so thick and they change the names of their products so often that it’ll make your head spin. So, keep in mind. I’m just going to keep it real and speak from the sysadmin point of view not the CIO Magazine angle. Here’s what makes Vmware so attractive from my point of view:

  • Live Migration (vMotion)
  • Live storage migration (vStorageMotion or whatever they call it)
  • Good SAN integration with lots of HBA drivers and very easy to add datastores from the SAN
  • Easy to setup multipathing for various SAN types
  • The “shares” concept for slicing CPU time on the physical box down to priority levels on the guests (think mainframe domains)
  • Easy to administer one machine (vSphere client) or many machines in a cluster (vCenter)
  • Distributed resource management (DRS) allows you to move busy VMs off to balance the load
  • Supports high availability clustering for guests (HA) and log-shipping-disk-and-memory-writes fault tolerant clones (FT). The latter is something I don’t think anyone else does just yet.
  • Allows you to over-commit both memory (using page sharing via their vmware-tools guest additions) and disk (using “thin provisioning”)
  • Allows easy and integrated access to guest consoles across a large number of clustered machines (vCenter)
  • Allows easy “evacuation” of hosts. Guests can spread themselves over the other nodes or all migrate to other hosts without a lot of administrative fuss. This allows you to do hardware maintenance on a host machine without taking downtime for the guests.
  • Customers in hosted environments can get access to a web-based console for just their host allowing them to reboot or view stats without getting support involved.
  • Some nice performance statistics are gathered both per-host and per-guest.
  • VMware is very efficient. In my testing only about 1-3% of the host’s CPU is degraded by the hypervisor. The rest the VMs do actually get. In some rare cases, they can even perform better (like cases where short bursty I/O allows the dual buffer caches of the OS and hypervisor to help out).

Why I Want to See them Fail (Vmware)

Want reasons besides the fact that they are a big evil corporation making very expensive non-free proprietary software with very expensive support contracts ?  Well let’s see:

  1. They refuse to make a Linux native or open-source client (vSphere client). It also doesn’t work at all with Wine (in fact WineHQ rates it GARBAGE and I agree). Want to see the console of a guest in Linux – forget it. The closest you can get is run it inside Windows in a desktop virtualization app like VirtualBox or Vmware workstation for Linux. I’ve also done it via SeamlessRDP to a “real” Windows server. Don’t even leave a comment saying you can use the web-console to view guest consoles. You can’t, period. The web console has about 40% of the functionality of the fat client (and it’s not the most used functionality for big environments) and is good for turning guests on and off. That’s about it. The web console has a LONG way to go. If they do beef it up, I’m afraid they’ll use Java or ActiveX to make it slow and clumsy.
  2. They are removing the Linux service console. Yes you can still get a Redhat-a-like service console for ESX 4.0 but not for ESXi 4.0. Also, they are planning to move all future releases toward a “bare metal hypervisor” (aka ESXi) in the future. Say goodbye to the service console and hello to what they call “vMA”. The latter is a not-as-cool pre-packaged VM appliance that remotely runs commands on your ESXi boxes. Did you like “vmware-cmd” and “esxcfg-mpio” well now you can federate your ESXi servers to this appliance and run the same tools from there against all the servers in your environment. The only problem is that the vMA kind of sucks and includes a lot of kludgy Perl scripts, not to mention is missing things that you might want like being able to do or script up directly on the host machine (it’s not a 100% functional replacement for ESX). The bottom line is that it’s not as Unixy anymore. They are moving toward a sort of domain-specific operating system (ESXi). I know I’m not the only one who will miss the ESX version when they can it. I’m friends with a couple of ex-VMware support folks who told me that they hated getting called on ESXi because it tied their hands. Customers never even knew about the VMa and frequently they had to wait while the clueless MCSE fumbled through putting together the vMA and wasted time that could have been spent troubleshooting if they’d been using ESX.

Redhat’s Latest – RHEV

Redhat has been making noise lately about it’s RHEL based virtualization product offerings. I’ve been wondering lately when they’d add something to the mix that would compete with VMware’s vCenter. I really hoped they’d do it right.  The story so far was that, in order to manage a large cluster of virtual machine host servers remotely from your sysadmin’s workstation you needed to VNC or XDMCP to the box and run Virtman or you could use command line tools. Anyone who has seen the level of consolidation and configuration options that vCenter offers to VMware admins would choke, roll their eyes, and/or laugh at those options. I’m a self-confessed unix bigot and even I know that “option” is a joke. Virtman is extremely limited and can only manage one server at a time.

Okay, so enter RHEV. Ready for this – the console runs on Windows only. Seriously! So you put up with a much less mature virtualization platform and you get stuck with Windows to manage it anyhow. I’ve never ran it with thousands of machines, but even with a few it was buggy, exhibiting interface lockups and showing about 60% of what vCenter can do. So, the only real advantage of having a true Unix-like platform to run on gets basically nullified by Redhat by pulling this stunt. Do us all a favor Redhat, sell your KVM development unit to someone with a clue.   KVM has some real potential, but gets lost in the suck of RHEV.

XenServer – Now 0wn3d by Citrix!

Well I had high hopes for Xen back in their “college days” before they got scooped up by Citrix. Now it’s a bizarre hybrid of an RPM-based distro (though they claim to be moving to a Debian base), a monstrous web-application platform (which isn’t all bad), and a whole lot of abstraction from the metal. My experience with their platform is about a year old and I wasn’t at all impressed. The web GUI had several serious issues like losing track of registered VMs when moving them around. It also had a lot brain-damaged Java-programmerish crap under the hood. I’m talking about tons of XML files to track VM configuration, naming, and location. Very little was traditional I-can-read-it-and-edit-it-just-fine-without-an-XML-viewer text files or key-value-pairs (ala an INI file). This and the fact that the virtual hard disks are big unreadable hashed names made popping the hood on XenServer a real mess.

Xen – In the buff on SuSE and Wrappered On Oracle VM Server

Well, SuSE 11 was the last time I played with Xen “in the raw”. Novell would like to sell you this thing call “Orchestrator” to try to give you something more than just  a Virtman interface to manage your Xen guests. I watched a demo by the Novell folks for Orchestrator and was not at all impressed. Half the functionality was something they said you’d basically have to script yourself. Well, news-flash Novell, I don’t need Orchestrator to write scripts to manage Xen. It may have changed since I last saw it, but IMHO as a long-time old-school sysadmin it added very little value.

So you want to try to script the management of Xen yourself? Well, it can be done. The problem is that almost all the CLI Xen tools are scripts themselves and are prone to hanging when the going gets tough. I had a fairly large Xen environment for a while and had a ton of problems with having to restart ‘xend’ to get the CLI tools unstuck. When they get stuck and you have an crond-enabled scripts depending on them you tend to get a train-wreck or at best a non-functional script.  It’s also very easy to step on your virtual machines using shared storage. There isn’t any locking mechanism that prevents you from starting the guest on two separate box using NFS or a clustered filesystem on a SAN. You have to use scripts and lock-files to overcome this. If you don’t you end up with badly corrupted guests. Additionally, the qcow2 format was very badly behaved when I last used Xen. Crashing SuSE 11 virtual servers resulted in more than a few corrupt qcow images. I had one that was a sparsefile claiming to be 110TB on a 2TB LUN.

What about OVM? Well if you want Oracle support I guess you could brave it. I tried it once and found it to be awful. Not only does it have some complicated three-tier setup, it’s also unstable as heck. I had it crash several times before I gave up and looked elsewhere. The GUI is web-based but it’s about as intuitive as a broken Rubik’s cube. You can download it for free after signing away your life to the Oracle Network. I didn’t spend much time on it after the first few terrible impressions I got.

Xen has potential, but until the CLI tools are more reliable it’s not worth it. The whole rig is a big hassle. That was my opinion about a year ago, anyhow.

Virtualbox – Now 0wn3d by Oracle!

Well, no it’s not an enterprise VM server. If they went down that path, it’d compete with OVM which is their absolutely horrible Xen-based offering.  However, I would like to say that VirtualBox has a few really good things going for it.

  1. It’s fast and friendly. The management interface is just as good as VMware workstation, IMHO.
  2. It does support live migration, though most folks don’t know that.
  3. It has a few projects like VboxWeb that might really bear fruit in large environments.
  4. It doesn’t use stupid naming conventions for it’s hard disk images. It names them the same as the machine they got created for.
  5. There are a decent set of CLI tools that come with it.

I have some real serious doubts about where Oracle will allow the product to go. It’s also half-heartedly open source. They keep the RDP and USB functionality away from you unless you buy it. For a workstation application, it’s pretty darn good. For an enterprise virtualization platform it might be even better than Xen, but nowhere near Vmware.

KVM and QEMU

Let’s keep this short and sweet. Fabrice Bellard is a genius and his work on Qemu (and the hacked up version for Linux called KVM) is outstanding. As a standalone Unix-friendly virtualization tool it’s impressive in it’s performance and flexibility. However, outside of RHEV (which is currently awful) there aren’t any enterprise tools to manage large numbers of Qemu or KVM boxes. I also haven’t really seen any way to do what VMware and XenServer can do with “shares”  of CPU and memory between multiple machines. There is duplicate-page sharing now (KSM), but that’s a long way from the huge feature set in VMware.  I have the most hope for the Qemu family but I really wish there was some great Unix-friendly open source management tools out there for it outside the spattering of immature web-based single-maintainer efforts.

Proxmox VE the Small Dark Knight

There is Proxmox VE, which is a real underdog with serious potential. It supports accelerated KVM and also has operating system virtualization in the form of OpenVZ. Both are manageable via it’s web interface. It also has clustering and a built-in snapshot backup capability (and the latter in a form that VMware doesn’t offer out of the box). It does lack a lot of features that VMware has such as VMware’s “fault tolerance” (log shipping), DRS (for load balancing), Site Recover Manager (a DR toolsuite), and the whole “shares” thing. However, considering it’s based on free-as-in-free open-source software and it works darm good for what it does do, I’d say it’s got great potential. I’ve introduced it to a lot of folks who were surprised at how robust it was after trying it.

Virtuozzo and OpenVZ

Virtuozzo is operating system virtualiztion. It’s limited in that you can only host like-on-like virtual machines. Linux machines can only host Linux and Windows can only host Windows. I have tried it out quite a bit in lab environments but never production. I was impressed with how large of consolidation ratios you can get. If you have small machines that don’t do much you can pack a TON of them on one physical box. It also has a terrific web-GUI that works great in open-source browsers. It has an impressive level of resource management and sharing capabilities, too. It offers a much better individual VM management web-interface than VMware (by far). It also has a lot of chargeback features for hosting companies (their primary users).

I have a friend who worked for a VERY large hosting company and used it on a daily basis. His anecdotes were not as rosy. He told me that folks were often able to crash not only their own box but the physical host server, too. This caused him major painful outages. I didn’t like hearing that one bit. However, I have definitely seen VMware and even Qemu crash hard. I’ve seen VMware crash and corrupt it’s VMs once, too (in the 3.x days). That was painful. However, I wouldn’t take such stories lightly about Virtuozzo.  Another negative was their pricing. The folks at Parallels were quite proud of the product and the pricing wasn’t much better than for VMware. You’d think they’d want the business *shrug*.

Conclusion

There’s a nice panoply of choices out there now but nobody is really giving VMware a run for their money outside of niche areas like OS virtualization. I’d love to see something like Proxmox take off and give VMware some headaches.  I’d also like to see much higher levels of Unix friendly focus from the big boys. We aren’t all MCSEs and lobotomy recipients out here in sysadmin land and a few decent Unix tools on the CLI and native-GUI front would be well received from the non-VMware players. I know it’s about market share, but it doesn’t excuse moves like the Windows-only management for RHEV stunt (*disgusted*). Here’s hoping the future for the free, open, and clueful VM platforms is brighter.

Advertisements

~ by aliver on January 2, 2011.

9 Responses to “Linux Virtualization – A Sysadmin’s Survey of the Scene”

  1. Very nice insight. Thanks for this great article. I think you may not have heard of Archipel as it has been released after your post. http://archipelproject.org/ This project seems to be highly promising as i have not been able to try it yet.

    • The only problem with archipel is the web interface, calling it a humongous web application would be an understatement. I have never seen anything like it. It will eat your memory like anything you’ve ever seen before, and there’s still a ton of functionalities left to implement…. lets just hope they figure out how to optimize cappucino. I the choice of xmpp for orchestrating vms and their agent (very easy to hack :)), much, much more flexible than ANYTHING out there, including openstack vmware and xenserver (which i think are all based on ampq, not sure about vmware though)

  2. I enjoyed your post however in your depiction of RHEV, I think you have missed a few things. I totally agree that RHEV requiring Windows is a downer but the software was pretty much complete when Redhat purchased it. They put the software out as a alternative to VMWare while they migrated the platform to Java and Red Hat.

    I would ask that you take a look at the roadmap for RHEV 2.3 and 3.0. The path is running on Linux, Java, and PostgreSQL

    • RHEV is crap. We installed it over VMWARE because it was cheaper. But you get what you pay for. RHEV basic features are ok but it is not enterprise ready and I found it to be buggy. We had an issue booting our hypervisor installed blades . I requested tech support from Red Hat and they had no clue. I finally discoverd (no thanks to red hat) that removing the fiber cables which connected the blades to our SAN allowed them to boot. Its a bug related to multipathing the fiber cards. The gui in windows only works in internet explorer, WTF. The interface is not very intuitive either.

  3. In the US we have a thing called capitalism. Some would argue that recent government actions such as the bank bailouts has changed that, however for the most part the incentive here in America is profit. That is why companies such as Microsoft, VMWARE, and Citrix dominate the market. Open source Linux will never rule due to the simple fact that no one with talent is willing to work for nothing. There is an old saying, “If your good at something, never do it for free” So far all I see from the linux scene is people taking something that already exists (and works well) and creating a free open source version. Case in point OpenOffice.org. In order to gain market share and be competitive, Linux companies will need to innovate and not just come up with alternatives. I go back to the OpenOffice example, its years behind Microsoft Office. Its the same thing with virtualization. You cannot honestly compare a product like KVM to VMWARE or Citrix and say that KVM is better. Does KVM have thin provisioning? Does KVM have deduplication and Disaster Recovery applications? These are innovations that VMWARE and Citrix not only have but do very well. I like linux/open source for certain things. Untangle is a great product and it saved me a lot of money rather then purchasing the Mcafee product, however I had to sacrifice certain things.

    • Joe, I’m not trying to imply that commercial companies don’t innovate and contribute to advancing technology. Your statement about “taking something that already exists” however, is a canard. There is a panopoly of features and ideas that have come from open source OS’s, especially in the security and language developement modalities. I think most open-source advocates would resent the implication that they are simply knock-off makers. In fact, I would assert that there is probably more opportunism going the opposite way. Look at all the devices and technologies that have integrated GNU libraries, the Linux or BSD kernels. Apple’s OSX is often touted as a glowing bastion of innovation is largely created from open source underpinnings (FreeBSD, Mach, etc…). I’m not saying that KVM is on par with VMware ESX in terms of features. However, you do seem to underestimate it’s capabilities. It does, in fact support thin-provisioning (qcow2 sparse files). As far as disaster recovery, that’s easily accomidated with DRBD and is easily deployed with pre-packaged open source solutions like ProxMox VE which does support local and geo-clustering via DRBD.

  4. I would just like to say this about that. I have been “playing” with computers and software since the Intel 8008 era. In the beginning we had BBS’s and connected to them over a 300 baud modem. The first systems were from Digital Research and called CPM. Then came Microsoft and MS-DOS. Then came Windows 1…. and then came…. well I think you get the idea.

    During all this, many, many programmers put out “free” code on the BBS’s that they had developed. Seems awfully strange that these little gems ended up in the next version of DOS. So, who is/was copying who?? In the beginning, things were “shared” a lot. That’s what the BBS’s were originally for. I’m not saying all these people shouldn’t charge for their services, in fact I would if I could do anything like they do. But do not say these Linux guys are copying. Some of them probably wrote the original code to begin with. They are now just sharing it … again … via Linux and Open Source.

    • Its not about who copies who, its about who has the better product. Thats the problem with free code. If it’s truly free, there is no patent. So when you say they showed up in Dos and Windows, isnt that what open source is supposed to be for? Why do you think those harvard boys got upset when facebook was created? They wanted to get rich and they felt the idea was stolen from them. This is America. The incentive for profit is what makes our country great. Competition in the market also makes America great so open source has a place too. To be truly competitive you cannot go on price alone. From a business perspective you have to look at the cost of implementation, support, employee labor costs.etc. It took my company over 3 months to move our virtual enviroment over to RHEV. I have installed Citrix Xen, Vmware, Microsoft, and Virtual Box in the past and I know for a fact I could have migrated in less then half the time. Most of the problems we had were bugs and driver issues. For some reason KVM couldnt migrate machines with more then 1 vdisk? Weird stuff that should not have been an issue. We had to call in Red Hat on more then one occasion and all they could give us was workarounds to bugs that they were already aware of.

  5. Very comprehensive list, i see someone mentioned archipel but no one mentioned Ganeti which sits on top of KVM and Xen and supports drbd, i haven’t had the chance to test it but it does seems like a nice project along with its django web interface GanetiWeb

Leave a Reply to aliver Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: