Huge Unix File Compresser Shootout with Tons of Data/Graphs

•June 22, 2010 • 13 Comments

It has been a while since I’ve sat down and taken inventory of the compression landscape for the Unix world (and for you new school folks that means Linux, too). I decided to take the classics, the ancients, and the up and coming and put them to a few simple tests. I didn’t want to do an exhaustive analysis but rather just a quick test of a typical case. I’m personally interested in three things:

  1. Who has the best compression?
  2. Who is the fastest?
  3. Who has the best overall performance in most cases (a good ratio of performance / speed). I thought that average bytes per second saved illustrated this in the best way. So, that’s the metric I’ll be using.

The Challengers

A few of these are based on the newish LZMA algorithm that definitely gives some good compression results. Let’s just see how it does on the wall clock side of things, too eh? Also, you’ll note some oldie but goldie tools like arj, zoo, and lha.

How it was Done

I wrote a script in Ruby that will take any file you specify and compress it with a ton of different commands. The script flushes I/O before and after and records the epoch time with microsecond resolution. It took a little time since there are a few different forms the archiver commands take. Mainly there are two-argument and one-argument styles. The one-argument style is the “true” Unix style since it works in the most friendly fashion with tar and scripts like zcat. That’s okay though, it’s how well they do the job that really matters.

My script spits out a CSV spreadsheet and I graphed the results with OpenOffice Calc for your viewing pleasure.

The Reference Platform

Doesn’t really matter, does it? As long as the tests run on the same machine and it’s got modern CPU instructions, right? Well, not exactly. First off, I use NetBSD. No, BSD is no more dead than punk rock. One thing I didn’t test was parallel versions of bzip and gzip. That would be pbzip and pigz. These are cool programs, but I wanted a test of the algorithms more than the parallelism of the code. So, just to be safe, I also turned off SMP in my BIOS while doing these tests. The test machine was a cheesy little Dell 530s with an Intel C2D E8200 @ 2.6Ghz. The machine has a modest 3Gb of RAM. The hard disk was a speedy little 80G Intel X25 SSD. My script syncs the file system before and after every test. So, file I/O and cache flushes wouldn’t get in the way (or get in the way as little as possible). Now for the results.

Easy Meat

Most folks understand that some file types compress more easily than others. Things like text, databases, and spreadsheets compress well while MP3s, AVI videos, and encrypted data compress poorly or not at all. Let’s start with an easy test. I’m going to use an uncompressed version of Pkgsrc, the 2010Q1 release to be precise. It’s full of text files, source code, patches, and a lot of repeated metadata entries for directories. It will compress between 80-95% in most cases. Let’s see how things work out in the three categories I outlined earlier.

Maximum Compression - Easy Meat - Compression Levels

Maximum Compression - Easy Meat - Compression Levels

You’ll notice that xz is listed twice. That’s because it has an “extreme” flag as well as a -9 flag. So, I wanted to try both. I found you typically get about 1% better performance from it for about 20-30% more time spent compressing. Still, it gets great results either way. Here’s what we can learn from the data so far:

  • LZMA based algorithms rule the roost. It looks like xz is the king right now, but there is very little edge over the other LZMA-based compressors like lzip and 7zip.
  • Some of the old guys like arj, zip, and lha all compress just about as well as the venerable gzip. That’s surprising to me. I thought they would trail by a good margin and that didn’t happen at all.
  • LZOP (which is one of my anecdotal favorites) didn’t do well here. It’s clearly not optimized for size
  • The whole bandpass is about 10%. In other words, there isn’t a huge difference between anyone in this category. However, at gargantuan file sizes you might get some more milage from the that 10%. The test file here is 231 megs.

Okay let’s move on while staying in the maximum compression category. How about raw speed? Also, who got the best compressor throughput. In other words, if you take the total savings and divide by the time it took to do compression who seems to compress “best over time”. Here’s a couple more graphs.

Maximum Compression - Easy Meat - Raw Speed

How fast in maximum compression mode were these archivers?

Maximum Compression - Easy Meat - Throughput

Average bytes saved over seconds - Higher bars mean better performance

Wow, I really was surprised by these results! I though for sure that the tried and true gzip would would take all comers in the throughput category. That is not at all what we see here. In fact, gzip was lackluster in this entire category. The real winner here was surprisingly the 90’s favorites arj and lha with an honorable mention to zip. Another one to examine here is bzip2. It’s relatively slow compared to arj and zip, but considering the compression it delivers, there is still some value there. Lastly, we can see that the LZMA crowd is sucking wind hard when it comes to speed.

Maximum Compression on Easy Meat Conclusion

Well, as with many things, it depends on what your needs are. Here’s how things break down

  • The LZMA (with a slight edge to xz) have the best compression.
  • Arj & lha strikes a great balance. I just wish it didn’t have such DOS-like syntax and worked with stdin/stdout.
  • Gzip does a good job in terms of performance, but is lackluster on the “stupendous compression” front.
  • The “extreme” mode for xz probably isn’t worth it.

Normal Compression – Easy Meat

Well, maximum compression is all fine and well, but let’s examine what the archivers do in their default modes. This is generally the place the authors put the most effort. Again we’ll use a 231MB file full of easily compressed data. Let’s first examine the graphs. First up, let’s just see the compression sizes:

Compression Levels using default settings

Compression Levels using default settings

Well, nobody really runs away with here. The bandpass is about 75-91%. Yawn. A mere 16% span between zoo and xz? Yes, that’s right. Maybe later when we look at other file types we can  see a better spread. Let’s move on and see how well the throughput numbers compare.

Maximum Compression - Easy Meat - Raw Speed

How fast in maximum compression mode were these archivers?

Normal Compression - Easy Meat - Throughput

Holy Smokes! We have a winner!

Well, this is interesting! Here’s were my personal favorite, Lzop is running away with the show! In it’s default mode, lzop gets triple the performance of the closest neighbor. The spread between the top and bottom of the raw speed category for normal compression is a massive 158 seconds. It’s important to note that the default compression level to lzop is -3 not -1. So, it should be able to do better still. Yeah lzop! I guess I should also point out that down in the lower ranked, gzip kept up with the competition and considering it’s good compression ratio, it’s a solid contender in the “normal mode” category.

Light Mode – Because My Time Matters

If you have a lot of files to compress in a hurry, it’s time to look at the various “light” compression options out there. The question I had when I started was “will they get anywhere near the compression of normal mode”. Well, let’s have a look-see, shall we?

Light Compression - Easy Meat - Compression Levels

Same layout as normal and max mode pretty much

Well, looks like there is nothing new to see here. LZMA gets the best compression followed by the Huffman crowd. LZOP trails again in the overall compression ratio. However, it’s important to note that there is only a 9% difference in what it did in 1.47 seconds while it took lzip a staggering 40.8 seconds. There was a 7% difference between LZOP and Bzip2, but there was a 25 second difference in their times. It might not matter to some folks how long it takes, but it usually does for me since I’m personally waiting on that archive to finish or that compressor script to get things over with.

Okay, maybe LZOP’s crushing the competition was because it defaults to something closer to everyone’s “light” mode (typically a “-1” flag to Unix-style compressors but the mileage varies for DOS-style). There is one way to find out:

Light Compression - Easy Meat - Throughput

LZOP delivers another brutal victory in throughput

Um. No. Lzop still trounces everyone soundly and goes home to bed while the others are still working. Also, keep in mind that the throughput number only measures compressed bytes not the total raw size of the file. If we went with raw bytes, the results would be even more dramatic, but less meaningful. Now if we could get it in a BSD licensed version with a -9 that beats xz. Then it could conquer the compression world in a hurry! As it stands, there is no better choice, or even anything close, to it’s raw throughput power.

Next up – Binary compression tests

Okay, now that we tried some easy stuff, let’s move on to something slightly more difficult and real world. Binary executables are definitely something we IT folks compress a lot. Almost all software distributions come compressed. As a programmer, I want to know I’m delivering a tight package and smaller means good things to the end user in most contexts. It’s certainly going to help constrain bandwidth costs on the net which is of even more importance for free software projects that don’t have much cash in the first place. I chose to use the contents of all binaries in my Pkgsrc (read “ports” for non-NetBSD users). That was 231 megs of nothing but binaries. Here’s the compression and throughput graphs. You might need to click to get a larger view.

231Mb of binaries compressed with various archivers like xz, gzip, and lzop

LZMA seems to run the show when it comes to compression levels

Well clearly the LZMA compression tools are the best at compressing binary files. They consistently do better at light, normal, and heavy levels. The Huffman and Limpel-Ziv based compressors trail by around 10-20% and then there are a few real outliers like zoo and compress.

Throughput while compressing binaries

Throughput while compressing binaries

Well, after the last bashing LZOP handed out in the Easy Meat category, this comes as no big surprise. It’s interesting to note that LZOP doesn’t have any edge in -9 (heavy) mode. I’d conclude that you should just use xz if you want heavy compression. However, in normal and light modes, LZOP trashes everyone and manages to get at least fair levels of compression while doing it. To cut 231MB in half in under 3.5 seconds on a modest system like the reference system is no small feat.

Compression Edge Cases

What about when a compression tool goes up against something like encrypted data that simply cannot be compressed? Well, then you just want to waste as little time trying as possible. If you think that this is going to be the case for you, ARJ and LZOP win hands down with ZIP trailing a ways back in third place. I tried a 100MB encrypted file and nobody came close to those two in normal mode. I also tried using AVI and MP3 files and the results were the same. No compression, but some waste more time trying than others.

Compression Fuzzy Features

There is certainly a lot more to a compression tool than it’s ability to compress or it’s speed. A well devised and convenient structure and a nice API also help. I think this is one reason that gzip has been around for so long. It provides decent compression, has been around a long time, and has a very nice CLI interface and API (libz). I also believe that tools that don’t violate the “rule of least surprise” should get some cred. The best example of this is gzip because it pretty much sets the standard for others to follow (working with stdin/stdout, numbered “level” flags, etc..). However, gzip is really starting to show it’s age and considering the amount of software flying around the net, it’s wasting bandwidth. It’s certainly not wasting much from the perspective of the total available out there on the net (most of which goes for video and torrents statistically anyhow). However, if you are the guy having to pay the bandwidth bill, then it’s time to look at xz or 7zip. My opinion is that 7z provides a most DOS/Windows centric approach and xz is the best for Unix-variants. I also love the speed of LZOP and congrats to the authors for a speed demon of a tool. If your goal is to quickly get good compression, look no further than LZOP.

You might need some other features like built in parity archiving or the ability to create self-extracting archives. These are typically things you are going to find in the more commercial tools. Pkzip and RAR have features like these.However, you can get to the same place by using tools such as the PAR parity archiver.

There are also tools that allow you to perform in-place compression of executable files. They allow the executable to sit around in compressed form then dynamically uncompress and run when called. UPX is a great example of this and I believe some of the same folks involved with UPX wrote LZOP.

More Interesting Compression Links

There sure are a lot of horrible reviews when you do a Google search for file compression reviews. Most of them are Windows centric and try to be “fair” in some arbitrary way by comparing all the fuzzy features and counting compression and speed as just two of about 20 separate and important features. Maybe for some folks that’s true. However, in my opinion, those two features are the key features and everything else must needs play a second fiddle. Sure it might be nice to have encryption and other features in your archiver, but are you too lazy to install GPG. I take seriously the Unix philosophy of writing a small utility that does it’s primary job extremely well.  That said, there are some good resources out there.

  • Wikipedia has a great writeup with info about the various archivers out there in it’s comparison of file archivers.
  • Here’s a site which focuses more on the parallel compression tools and has a really great data set for compression which is much more comprehensive than what I did. It’s
  • Along the same lines as the last site is the Maximum Compression site. They do some exhaustive compression tests and cover decompression, too. I didn’t focus on decompression because most utilities are quite fast at it and the comparisons are trivial.
  • TechArp has good review of the stuff out in the Windows world. However, their site is bloated over with annoying flashing ads. If you can ignore those, then check it out.

Happy compressing!

Woodworking’s Unix Metaphor – Top 10 Reasons Why I Use Hand Tools for Woodworking and the CLI for Unix

•March 4, 2010 • 3 Comments
  1. They are much safer than power tools. Cutting off your finger with a band saw or table saw is nearly instant. You finger will hit the floor before you realize what just happened. Also, you don’t need to constantly wear a dust mask to prevent yourself from getting some nasty disease when working with hand tools. Unix CLI tools give you more fined grained control and involvement. Thus, I’d argue they are safer in many cases than their GUI counterparts.
  2. They don’t make much noise. I’m a night owl, but my neighbors are not. They wouldn’t appreciate the sound of a router at 3:00AM, but a router-plane? No, problem. Unix CLI tools have a smaller footprint on a system than their GUI cousins. Think of how much you can do at any time on a remote server via secure shell without being noticed for eating CPU time or showing up in a control panel somewhere and being badgered about it (ala VMware vCenter).
  3. I get sick of technology. I get sick of servers (well, sometimes). I feel a connection with the past, knowing that people 300 years ago were doing the same thing. I want to actually make something that isn’t so ephemeral and was a labor of love. I’m not in a hurry and I don’t want to be rushed by a machine, a boss, or a deadline. Who cares if it takes longer. That’s not the reason why I’m doing it. The CLI tools also connect me with the masters like Dennis Ritchie and Ken Thompson. I like thinking that the useful thing I create for myself or others follows some well honed tradition.
  4. I can fully understand the tool and it’s capabilities. I know what to expect from it and how to tweak it for my needs. A hand plane or saw only has a few parts. A modern table saw is pretty complicated and can break or misbehave in ways I won’t immediately be able to fix or perhaps grasp. The same is true for CLI versus GUI tools. I know how ‘tr’ or ‘sed’ works. However, your wiz-bang GUI-based Java tool might blow up and simply give me a dialog that says “I’m hosed” — “OK?” How do I address that or fix it?
  5. Pride. Perhaps it’s just elitism, I don’t know. Deadly sin or not, I’m not sure I care. Any meathead can whip out a circular saw and an edge guide to make a straight cut. Can he do it with a backsaw ? Anyone can shove a board into a planer, but can they make a perfectly square workpiece with a jack plane, and tune that plane ? No, it’s not rocket science or magic, but hand tools take some investment of skill and finesse that can only come with practice. The same is true of CLI versus GUI tools. Sure, you can click your way through making a cluster with the XML-crap tool that ships out with LinuxHA 2.x nowadays, but will it stand up like a HP ServiceGuard cluster with hand-crafted resource scripts ?  My experience says, “no way”.
  6. The results are one of a kind and truly intrinsically special. Do you think 50 years from now, Southerbys will want your fiberboard bookcase from Wal-Mart with peeling vinyl veneer? The art shows in a real labor of love. That’s why folks will want it that much more 100 or 200 years later. Machine made junk is still Chinese robot-made junk even if it doesn’t fall apart right away. Of course, in IT, this is a touchy one for the manager types out there. Everyone complains about something “custom” in IT. That means they can’t yank you out of your seat and replace you with someone cheaper at their whim. However, they don’t often consider what the real value of that expert’s work was.  They usually also don’t consider simply asking you to document your work to a degree that an expert with your same skill could follow it. The focus these days is on interoperability with an emphasis on less skilled folks. However, the truly phenomenal innovations still generally come from the wizards in a cave, not the thousand monkeys. Google, Linux, Facebook, C, and other big-deals-in-IT didn’t come from a sweatshop overseas and were generally framed in initially by one or two talented hardworking people.
  7. I like total control over my work. I take pride in what I make. I don’t want it ruined by applying too much power too fast or flinging oil onto my $30 a board foot exotic woods and ruining the finish. With hand tools, the only one to blame for bad results is me.  The same is true with the CLI. I can use a minimal amount of resources on the system by hand-crafting solutions that do only what’s needed.
  8. Hand tools ease my stress instead of causing it. Due to the noise and propensity to blow up in my face if there is a hidden nail in the board, I get nervous when I fire up a 2 1/2 horsepower router or 5 horse table saw. The same is true when I use GUI tools. When the hard drive is cranking away and the GUI is locked up to the point it won’t even repaint the window I start thinking “Great, am I going to lose all my work?” I have a more deterministic attitude when coding a script or running a CLI tool that’s been around for 35 years.
  9. I have to exercise patience. It’s just good for my mental health to not be in an instant gratification mindset all the time.
  10. I’m in good company. I’ve noticed a lot of guys love to brag about their power tools. They have a bazillion watt band saw or a drill press with a laser on it. Who cares? Your tools don’t make you more skilled or instantly give you the benefit of practice and experience. Sure, you have a router dovetail jig. Do you use it? Show me your work. Don’t tell me what kind of crazy tool collection you have. I’m not impressed. It reminds me of people who brag about all the pirated art software they have. You expect me to believe that because you pirated Maya, 3D-studio, and Photoshop that you are an artist? Does the fact that you just purchased a 10-jigahertz CPU make you able to code better or even faster than I can? I still use my 200Mhz SGI Indy sometimes just for the fun and nostalgia of it. The code I write on it still compiles on a supercomputer. I’ve found that I others who think this way tend to produce good work rather than simply buying fast tools.

Sysadmin State of the Union on 10Gbit Ethernet and Infiniband

•February 9, 2010 • 1 Comment

Yes, it’s been out a while. However, now that there are a few fairly mature 10Gbit ethernet NICs and switches we in the trenches need to know the real-deal, non-marketing skinny. Here’s what I’ve been doing

  • Testing 10Gbit Cisco Nexus 5000 switches side by side with Arista
  • Testing Mellanox and Intel 10Gbit NICs
  • Lots of storage + 10Gbit experiments

I’ve learned a lot about this critter lately having been ankle deep in 10Gbit kit for the last year or so. At my shop, we are still trying to scrape together the cash for a full datacenter overhaul and brother let me tell you it’s an expensive proposition. Spendy it may be, but there are some extremely tangible benefits to going 10Gig. If you know the theory, but haven’t touched 10gig yet, let me give you what I consider to be the most admin-germane observations and facts about it.

  • It really is 10 times faster. It’s not like wireless or USB or some other technology where you know they are lying through their teeth when they claim it’s X-times faster. I have some nice wireless gear but it comes nowhere near to the theoretical max even when I’m in the same room with the AP. I have tested 10Gbit cards and switches using the venerable iperf tool. I can actually test and verify that it’s really and truly 10 times faster, no BS.
  • Today, in 2010, it’s going to cost you about 2000 bucks a port if you go with fiber. You’d need SFP+ modules also called XFP sometimes; they’re pricey. They use LC-LC fiber connections. There are other, less common fiber and twinaxial formats, too.
  • You can get NICs and switches that use CAT5. They eat more power (but not much more compared to a server) and they have considerably more latency (twice as high in some cases). It’s much cheaper to go this route, however. Your per-port costs are cut in less than half.
  • It’s a lot harder to make 10G ethernet suck (latency and bandwidth wise) than Infiniband since the latter is more picky about it’s various transport modes (SDP vs IPoIB etc…). In the end, though IB is still faster if both are well configured.

Today, Infiniband is cheaper and gives you better latency (and potentially up to 40Gigabit). However, I still think 10gig has some advantages over Infiniband. One is that it’s pretty safe to say it’s going to catch on faster and more pervasively than infiniband. There are also more vendors to choose from if you go with 10Gig. Being on open-source kinda guy, I also see better support for 10Gig and Ethernet in general versus Infiniband. That last statement doesn’t apply to Mellanox, who has source-available drivers for Linux and even FreeBSD (which makes me happy) !

The Intel 10Gig cards seem to have the most pervasive driver support. Testing with the venerable iperf reveals that it will indeed run at 9.9 mbit/s. The Mellanox cards I tested (ConnectX EN) will do the same, but seemed to be a bit more sensitive to your driver being up-to-date.  Here’s what I’d consider using 10Gig for today:

  • Switch interconnects
  • Filer uplinks
  • AoE, FCoE, and iSCSI transport as a cheaper-than-SAN-but-not-quite-as-good stand-in.
  • HPC apps that need low latency (use fiber SFP’s, though)
  • Highly consolidated VMware servers
  • Bridging 10 or 20 Gig Infiniband to 10Gig Ethernet for storage or HPC apps.

Here’s where I wouldn’t:

  • Desktops (too expensive per port and NIC)
  • Work-a-day servers which can be easily clustered (ala webservers)
  • Any application that can use high-concurrency to overcome lack of single stream bandwidth (simple file and profile servers). You can add more Gig NICs instead

If vendors can bring the price per port down to a more accessible level it’ll be just like the move from 100Mbit to Gig. However, what’s stalling that right now is the high power requirements that come along with 10Gig + CAT5. Some EE will work that out, you can be sure. The sooner the better, too!

10 reasons why (AoE) ATA over Ethernet is awesome

•February 2, 2010 • 4 Comments
  1. It’s performance beats seven shades of snot out of iSCSI when you test on the same hardware and network rig. I tested with Fio, Dbench, and good old “dd”. Every stat is better for AoE when compared with iSCSI and FCoE. I used vblade for AoE, open-iscsi, and open-fcoe.
  2. Automatic and transparent failover when it comes to client network interfaces.
  3. Automatic port aggregation that scales very well without trunking/bonding/channeling or any real effort.
  4. Works at the Ethernet (layer-2) level which keeps it simple, low-latency (especially on 10Gbit networks), and super-easy to setup.
  5. Cross-platform open-source support for Linux, Windows, Mac OS X, and FreeBSD. You can get closed source drivers for VMWare ESX, too.
  6. Block level support (meaning it looks like a disk, not a file system) allows you to combine it with other technologies like Logical Volume Management (LVM) to create very fault tolerant systems. Think about taking two AoE servers and combining LUNs from them both using metaraid and/or LVM. Now you can grow and shrink your volumes dynamically, lose either file server (with RAID-1) completely with no issue, and use whatever file system you like (or is best for your app).
  7. It’s hauls serious butt when combined with 10Gbit Ethernet (and more than iSCSI, too).
  8. You don’t need to buy TOE for your NIC and even with ToE it whoops iSCSI (I tested).
  9. The setup for the AoE server and client software is dead simple and doesn’t involve large numbers of not-immediately-needful configuration files (ala open-iscsi) or changes. You basically say “I want this file (or partition) to be shared out over AoE using virtual slot 2 unit 5” (or whatever). Bam. The clients see it (broadcast at layer-2) and can attach if they have access from the server to do so.
  10. Works with disks (AoE share out the whole thing some partitions) or with files (it just shares out the file as a LUN and acts like it’s a disk). It even deals with sparse files as a backing store (thin-provisioning).

Pretty cool eh? I took a Dell R905 with 128G of RAM, Four Quad-Opertons, a Dell PERC/6E, a Sun J4400 with 24 1Tb SATA disks, and shared out a bunch of disks via AoE over Intel 10Gbit Ethernet cards, 1Gbit BNX2 Ethernet ports (onboard), and Cisco (1Gbit) and Arista (10Gbit) switches. The clients were Dell R710 Nehalem dual-quads with 48Gb of RAM. I ran a lot of benchmarks using XFS, Reiser3.6, and EXT4. I also did some raw ones using just metaraid devices. The speed was pretty outstanding compared to iSCSI in quite a few cases it’s twice as fast. In some more rare cases it was 10 times the speed of iSCSI. Unless you need the layer-3 route-ablity of IP based storage my advice is to skip a rung on the OSI model and stick with AoE. It’s a really worthy protocol with some really unique and valuable features. The automatic way it aggregates your Ethernet bandwidth and “self-heals” when losing an interface or switch is very impressive.

Quick thoughts on Splunk

•February 2, 2010 • 1 Comment

Here is the skinny on Splunk. It’s an generic analytics tool that most folks use for log consolidation and searching. I’ve spent about two weeks in the lab with it in between other projects. Here’s the executive summary:

  1. It is, indeed, useful. It enables you to find data you otherwise might not have found due to too-much-effort-required-so-why-try syndrome.
  2. With the right server, it’s very fast
  3. It’s quite pretty and the graphs are actually useful
  4. You can also log, search, and graph non-syslog data like Windows WMI logs, Cisco logs, Web proxy logs, and various security-oriented logs
  5. It’s dead simple to get it up and running
  6. It scales really well

Just so you don’t think I’m cheerleading. I’ll let you in on the things that are not-so-good but still bearable in my opinion.

  1. If you have any significant log volume at all, you’d better get a beefy server to run the indexing and search servers (can be one box or multiple). I tested it on a 16-way Nehalem box with 48Gb of RAM. It stood up well there, but on my original modest VMware guest it was dog-slow. I was doing about 5 GB  of logs per day.
  2. The “lightweight forwarder” functionality is very useful, but not at all what I’d (I’m a C programmer) call lightweight. It’s often, by far, the most CPU and RAM eating process on some, otherwise underutilized, servers. It likes to chew up 1-2% of the CPU at times where I really have to wonder what the heck it’s doing (ie.. no new logs). Yes, I had indexing turned off on those nodes.
  3. If you rely on their system of using lightweight forwarders rather than just sending remote syslogs you end up kind of in bed with them in a way that would be a little painful should you wish to come back onto the “let’s just use what’s traditional” reservation. However, the lightweight forwarder does add some valuable features like the ability to run arbitrary commands, log them, then mine that data (think ‘ps’, ‘vmstat’, ‘netstat’, and ‘iostat’).
  4. Their product is moderately expensive

Yes, you could almost everything Splunk does with egrep, gnuplot, graphviz, and a lot of custom scripts. However, it’s a great tool if you don’t have the time or just want a nice way to centralize the logs with a minimum of fuss. Overall, I can tell they listened to a few good sysadmins on how to design the tool and make it useful. That’s something very rare when dealing with commercial software these days.

Isilon IQ and S series rocks

•December 18, 2009 • 4 Comments

Just finished about a month of resiliency and performance testing of the Isilon OneFS-based clustered storage. In our lab we had six IQ9000 nodes, six 5400S nodes and an accelerator node with dual 10Gbit interfaces. I tested NFS, CIFS, FTP, HTTP, and even got a hold of their beta code to test iSCSI. While I can’t share any benchmarks (NDA) I have to say, their gear is awesome. It’s super-bulletproof. You can powerfail drives, nodes, or the back-end Infiniband switches and the thing just keeps on rocking. The selectable parity can even allow you to take more than one node failure and keep trucking along.

The  GUI is not a huge DHTML crap-infested monstrosity like some I could mention (ahem rhymes with Gun – as in what you want when you use it). It gets the job done, shows you the stats you really would care about like “How full is my cluster in total?” or “How much traffic am I seeing to all nodes?” etc.. The command line is also very functional. Most of the commands start with ‘isi’ like “isi stat” that will show you the cluster status.

One rather dramatic display of simplicity is when you setup a new node. You just connect it to the infiniband and within two keystrokes on the node’s front panel you have the new node join the cluster. The cluster expands the space available and get’s you going right away.

Bad Experiences with Sun 7410 Unified Storage Appliance Filers

•June 20, 2009 • 9 Comments

If you want the short version: run away screaming as fast as you can. You will find all kind of magazine reviews for the unified storage line that includes the 7410 as it’s flagship. You will see Fishworks developer blogs at Sun telling you that you can get insanely high speeds from these filers, and you will see lots of slick marketing for them on Sun’s website. Let me, the guy that’s worked with eight of them over a period of about six months (very close to their release) provide probably the only voice of contention you are going to experience during your googling on these turds. I’ll give a run down of the 7410’s features and then we’ll cut the crap and talk real.

  1. Use of ZFS as a back end storage filesystem and all the associated benefits that comes with (storage pools, snapshots, compression, good performance, raid-z (raid6), and volume-management like capabilities, replication, and self-healing)
  2. Use of commodity SATA disk drives. In my case simple Seagate 1Tb disks with no custom firmware or EMC-alike microcode crap to keep you from replacing them from OTS disks.
  3. Multi-path SATA JBODs and LSI SAS controllers that connect to SAS directors on the back end of the JBODs. Sounds great right?
  4. Use of standard Sun Galaxy class servers as heads. Thus insuring as newer servers come out and the “fishworks” filer software is ported to them, you can get better performance.
  5. A GUI interface even a Windows MCSE could use offering you a lot of very pretty analytics which cover some actual real-world usage scenarios.

Now I must admit that all that sounds great. In fact, it is great. The filers do, in fact, have these features and they do work out of the box. They don’t nickle and dime you like NetApp or EMC over features like replication or compression and the price is very competitive compared to NetApp and especially with EMC or HP storage.

Now for the bad news

The 7410 has endemic instability problems and a terrible internal design that will probably insure that they stay that way.

  1. They crash more or less constantly. I’d like to say it was only a localized problem to one set of filers we’ve used. However, it’s continuous and chronic and happens to all 7 of our filers. We’ve filed many novel bugs with Sun from everything to GUI interface lockups which nearly always coincide CLI locksouts and disable your ability to administer the filers to old-fashion kernel panics with all kinds of nice zfs calls in the backtrace. Repeatable, and constant are these bugs.
  2. Their interface is so painfully slow and inefficient it can cause problems of real magnitude. The GUI can lock up the CLI? Check. The GUI is full of CSS, Javascript, and DHTML issues? Check. The CLI hangs and freezes on simple operations like showing configuration of network and storage? Check.
  3. The command line interface is written in Javascript. Form your own opinions on that one.
  4. Cluster join, failover, and rejoin times are FOREVER compared to their competition. The fastest I’ve ever seen is 4.5 minutes and that’s with a minimum number of disks (48). Add more disks and it’s even slower. Not to mention the fact that if the clusters actually succeed to failover without locking up you can count yourself very fortunate. Kind of defeats the whole point of having a cluster at all, wouldn’t you say, Sun?
  5. Simple operations in the GUI can crash not only an individual filer but also the cluster, too. I’ve had it crash due to simple network reconfiguration or storage rebuild. How about a crash due to stopping replication or a crash of both filer heads in a cluster while trying to failover. Yep. I’ve seen all that and many, many times.
  6. They had the bright idea they should use the Solaris Express (beta) code instead of the mainstream Solaris 10 codebase.
  7. The wiz-bang analytics are very often simply wrong. I’ve compared sniffer output and nfsstat results to what it says and it’s as simple as this: it lies.

This product seems to have become a victim of the Solaris 10 mentality that what’s been working for the last 40 years for Unix is all wrong and broken. We need XML config files, Javascript coded core applications, and GUIs these days, right? Wrong. This is an enterprise product and it’s made as if it’s going to be run by 5th graders. The marketing wants your manager to believe it’s going to allow him to reduce his head count of SAN guys by buying this thing. Sorry, but the gear has to actually stay running before you can do that. When you predicate a storage appliance on XML, Javascript, and other web toys for the core functionality and not just the GUI it’s asking for trouble. These guys should have taken a lesson from NetApp and followed the KISS principle to utilize ZFS and beat them at their own game. Instead, I’m left wondering how I can make excuses not to deploy anything on these boat anchors that crash so often (often kernel panics, not just interface lockups) that customers are blaming me for data corruption (due to the crashes) and the general instability of the system. Had I anything substantive to do with the selection of these units, I’d have said “No thanks, Sun.”