Bad Experiences with Sun 7410 Unified Storage Appliance Filers
If you want the short version: run away screaming as fast as you can. You will find all kind of magazine reviews for the unified storage line that includes the 7410 as it’s flagship. You will see Fishworks developer blogs at Sun telling you that you can get insanely high speeds from these filers, and you will see lots of slick marketing for them on Sun’s website. Let me, the guy that’s worked with eight of them over a period of about six months (very close to their release) provide probably the only voice of contention you are going to experience during your googling on these turds. I’ll give a run down of the 7410’s features and then we’ll cut the crap and talk real.
- Use of ZFS as a back end storage filesystem and all the associated benefits that comes with (storage pools, snapshots, compression, good performance, raid-z (raid6), and volume-management like capabilities, replication, and self-healing)
- Use of commodity SATA disk drives. In my case simple Seagate 1Tb disks with no custom firmware or EMC-alike microcode crap to keep you from replacing them from OTS disks.
- Multi-path SATA JBODs and LSI SAS controllers that connect to SAS directors on the back end of the JBODs. Sounds great right?
- Use of standard Sun Galaxy class servers as heads. Thus insuring as newer servers come out and the “fishworks” filer software is ported to them, you can get better performance.
- A GUI interface even a Windows MCSE could use offering you a lot of very pretty analytics which cover some actual real-world usage scenarios.
Now I must admit that all that sounds great. In fact, it is great. The filers do, in fact, have these features and they do work out of the box. They don’t nickle and dime you like NetApp or EMC over features like replication or compression and the price is very competitive compared to NetApp and especially with EMC or HP storage.
Now for the bad news
The 7410 has endemic instability problems and a terrible internal design that will probably insure that they stay that way.
- They crash more or less constantly. I’d like to say it was only a localized problem to one set of filers we’ve used. However, it’s continuous and chronic and happens to all 7 of our filers. We’ve filed many novel bugs with Sun from everything to GUI interface lockups which nearly always coincide CLI locksouts and disable your ability to administer the filers to old-fashion kernel panics with all kinds of nice zfs calls in the backtrace. Repeatable, and constant are these bugs.
- Cluster join, failover, and rejoin times are FOREVER compared to their competition. The fastest I’ve ever seen is 4.5 minutes and that’s with a minimum number of disks (48). Add more disks and it’s even slower. Not to mention the fact that if the clusters actually succeed to failover without locking up you can count yourself very fortunate. Kind of defeats the whole point of having a cluster at all, wouldn’t you say, Sun?
- Simple operations in the GUI can crash not only an individual filer but also the cluster, too. I’ve had it crash due to simple network reconfiguration or storage rebuild. How about a crash due to stopping replication or a crash of both filer heads in a cluster while trying to failover. Yep. I’ve seen all that and many, many times.
- They had the bright idea they should use the Solaris Express (beta) code instead of the mainstream Solaris 10 codebase.
- The wiz-bang analytics are very often simply wrong. I’ve compared sniffer output and nfsstat results to what it says and it’s as simple as this: it lies.