Currently Being Moderated

The reaction of other storage vendors to our 50% storage efficiency guarantee sometimes surprises me, with comments like "Its unfair to compare RAID-DP vs RAID-10" or "Terms and conditions mean nobody will ever benefit". It’s not often that I see all of these arguments presented in one place, even more surprising to me is when these arguments appear in a local IT journal. The title of the article is "NetApp storage efficiency guarantee debunked", and can be found at the following URL

 

http://www.itnews.com.au/News/98263,netapp-storage-efficiency-guarantee-debunked.aspx

 

Although I was pleased to find that I was allowed to insert a brief comment, I couldnt write as much as I would have liked, so I thought I'd talk more about it on this blog.

 

The title “NetApp Storage Efficiency Guarantee Debunked” certainly grabbed my attention, and caused a few guys in the office to say "what the ...". This is all good though, because an attention grabbing title gets people to read the article and find out how we can help to reduce the costs of implementing server virtualization. However, as my lovely wife keeps reminding me, I'm more than a little pedantic so I cant resist commenting on the word "debunked".

 

By most definitions debunking something is

 

"To discredit, or expose to ridicule the falsehood or the exaggerated claims of something" - http://en.wiktionary.org/wiki/debunk

 

EMC’s Darren McCullum tried his best to discredit our storage efficiency guarantee, but I don't think he really succeeded. There certainly isn’t any falsehood to the guarantee, its well documented, the terms and conditions are public and well documented, nor are the claims of space savings exaggerated. If anything, from the experience of the customers I've been dealing with, they're conservative.

 

Even though Darren didn't really succeed at debunking anything, he was pretty critical of our value proposition, and I really feel that those criticisms deserved the kind of detailed rebuttal that is hard to do within the context of a news article.

 

Firstly Darren argues that the 50% number is inflated because the guarantee requires comparison vs RAID-10. Alex McDonald covered that issue really well here http://blogs.netapp.com/shadeofblue/2008/10/after-the-virtu.html, but I’d like to add my $A 0.02c worth.

 

Most array vendors state that their arrays have five nines of availability, its pretty much an absolute requirement for an enterprise class array. But one thing that often gets glossed over is that that number is often just for the array controllers and the disk enclosures. If a single RAID group fails due to a double disk failure, but the controllers are still up and running, that still counts as uptime for the array. In a pre-VMware world, the failure of a single raid group may have been unfortunate, and caused downtime on a single server, but now the failure of a single LUN affects ten or twenty times as many virtual servers, which might well be considered a disaster, especially when one of the justifications for virtualisation is to improve server uptime.

So what are the chances of a double disk failure ? Whenever I address a large group of IT people, I ask how many people have been affected by a double disk failure in the last few years. In general about 5 percent of the attendees hold their hand up, and I strongly suspect there are a few more who want to avoid the embarrassment. This correlates pretty well with a report published  by IBM on best practices for Exchange. In this report, smarter people than me using some sophisticated mathematical models I don’t entirely understand, predict the chances of losing data in a typical RAID-5 configuration as being about 6% over a five year period as shown in the following table

 

Raiddp_2

 

I’ll repeat this, because its important. If you have 42 disks configured in a 7+1 RAID-5 Group, the chances of one of those raid groups failing within 5 years is over 5%. Every year there is more than a 1% chance that one of your RAID groups will fail, and the scary thing is that when that LUN fails, all the VMware Datastores that rely on that RAID group will also fail.

 

What amazes me is that nobody I know of would ever purchase a storage array that has less than 99% availability. Correspondingly, I don’t see why anyone should tolerate an array that has less than 99% availability at the LUN or VMware Datastore level. With the except for NetApp (and IBM when they sell N-Series) who recommend RAID-DP for all workloads, nobody else to the best of my knowledge can demonstrate that they provide five nines of availability all the way down to the Datastore.

 

Using RAID-5 for a large VMware implementation just seems dangerous to me, which leaves RAID-10 as the only viable alternative to RAID-DP for VMware workloads, and thats just one of the reasons we insist on comparing our savings vs RAID-10. The other reason is that on a spindle for spindle basis RAID-DP + WAFL out performs RAID-10 for almost every random I/O workload, and completely blows standard RAID-6 approaches away. That incidentally, is why we don't do the comparison vs other vendors RAID-6 implementations. Remember the terms and conditions requires that we are compared vs storage with equivalent performance and reliability characteristics.

 

Having said that, the question of whether I think you should use RAID-10 for VMware on non-NetApp storage is pretty much a moot point, most recommendations seen on vendor forums such as

 

http://forums.hds.com/index.php?showtopic=476
and
http://communities.vmware.com/docs/DOC-2660;jsessionid=33341777E4EF008BCE57DE04818E33EC

 

suggests that RAID-10 should be reserved for special cases that require the highest levels of performance and availability. Again, I think that the shared nature of VMware Datastores means that they all deserve the highest possible reliability. Having said that, lets assume that I'm overly cautious and the other vendors are right, that RAID-5 is just fine and dandy for most VMware platforms, and that RAID-5 should be the basis of comparison for the guarantee.

Lets say that it is an even match between 8+1 RAID-5 vs. 16+2 RAID-DP what would we guarantee then ? The answer to that question can be found in the 35% guarantee for V-Series in front of third party arrays. Differences in RAID efficiencies are not considered with the V-Series guarantee. The reason for this is that the V-Series uses whatever RAID protection is provided by the backing array. We simply work with what we are given, and then make it more efficient via deduplication and fine grained, no cost, thin provisioning

 

This is one area where I think Darren misunderstands exactly what the 35% guarantee is about where in a comment about the guarantee he says

 

that among the conditions is a requirement that the customer is migrating from one of the most conservative RAID (redundant array of independent disks) configurations - RAID 10”.

 

People I’ve spoken to who know Darren say he’s a good guy, so I’ll assume that he’s working from bad information, maybe from internal EMC sources, or that he didn't thoroughly read our documentation. Either way this is the kind of misinformation that I see far too often from companies like EMC. For various reasons that kind of stuff really irks me, and makes me lose respect for companies who otherwise do pretty good engineering.

The thing is that while the RAID-DP vs RAID-10 requirement is true of the 50% guarantee I think many people after reading the article will incorrectly conclude that this also applies to the V-series 35% guarantee. This is something I'd like to clarify.

 

To give a worked example, lets say that your existing storage vendor says you need to buy an additional 10TB of RAW storage for your array to support your VMware environment. That 10TB of RAW storage might then be carved up using RAID-5 or RAID-10 or MetaLUNs or Hypers or V-RAID or whatever, it doesn’t matter. What does matter is that if you provisioned that storage configured in the same way to VMware through the V-Series gateway, you will only need to buy an additional 6.5TB of RAW storage rather than 10TB of RAW storage. Another way to think about this is that just by implementing V-Series for your VMware environment, you get an additional 35% discount on the price of RAW disk from your existing vendor for vast majority of the storage. You’ll also use 35% less power, space, cooling capacity etc. What’s more, if your current vendor thinks they’ve got you locked in for disk upgrades and are charging you like a wounded bull for every additional TB, then you can always buy the capacity directly from us and attach NetApp shelves to your V-Series gateway. That ought to make them sharpen their pencils considerably !

 

If I were looking at ways of reducing my data centre costs, I think that V-Series in front of your existing arrays would have to come under the category of “low hanging fruit”. No wonder the other vendors are doing whatever they can to stop this getting momentum in the market.

OK, now that I’ve covered the RAID issue, I’d like to address the next section of the article that goes on to highlight Darren’s other issues with our storage guarantee. The assertion he makes is that our terms and conditions don’t cover enough use cases; specifically

The NetApp guarantee also insists that no more than ten per cent of the total data set being de-duplicated can be graphics, XML, database data, Microsoft Exchange data or encrypted data

True enough, data deduplication is a large part of the storage guarantee, and there are classes of data that either can’t be deduplicated such as encrypted files, or shouldn’t be deduplicated because they are already “spindle bound” (e.g. Exchange). He then goes on to say

This effectively excludes from the guarantee some of the most data-intensive areas of any organisation ...

If by “data-intensive” he means data sets that with large amounts of latency sensitive I/O, then I’d agree with him, we do exclude some of the most I/O intensive loads, though these are usually a fraction of the total storage capacity of the datacentre. If the data was really I/O intensive it probably wouldn’t be sitting on RAID-5 would it ? By many organisations criteria, it might not even be a suitable candidate for virtualisation. He then continues with

"..and leaves little else but file servers applicable."

Here is one statement where I must respectfully, but violently disagree. Firstly there are greater benefits in consolidating file servers onto a platform capable of running as high end NAS. Secondly most Intel servers in enterprise data centres aren’t running high I/O throughput applications, and if they are, then there’s a pretty good chance that they’re not currently virtualised.  A quick Google on “what to virtualise” turns up a lot of website that confirm that the primary benefit of server virtualization is server consolidation. I particularly like the following quote from

http://www.itmanagement.com/faq/server-virtualization/

“The most often talked about benefit of virtualization. If applications running on separate computers do not utilize the computing resources of their computers, they can be consolidated onto a smaller number of servers using virtualization technology”

Or to quote someone closer to mine and Darren’s home ; Suncorp's business technology hosting manager, Tim Harlow, in http://www.zdnet.com.au/news/software/soa/Server-virtualisation-gives-Suncorp-hot-flushes/0,130061733,339286861-1,00.htm

“Suncorp's decision on which servers to virtualise has little to do with the mission criticality of the applications they support, according to Harlow. "It's only about the I/O and resource demands on the server -- that's what drives you”

Outside of exchange, database, and file servers, most datacentres have many single purpose utility servers running things like,

 

  • Active directory servers
  • DNS servers
  • Pop gateways
  • Time servers
  • Domain controllers
  • Web Servers
  • Legacy Application Servers

 

and the list goes on. These are the servers that are most often underutilized, and the ones which benefit the most from server virtualization.

While there are many good reasons to virtualise your infrastructure that have nothing to do with consolidation; including increased availability, easier provisioning, and easier management;  consolidation of underutilized servers is still where most of the hard dollar savings are to be found. From the discussions I’ve had with Australian IT organisations, server consolidation is still the prime reason server virtualization projects get approved, the other stuff is great, but not nearly as compelling in hard dollar terms, so when Darren’s states that

The conditions you have to meet for the guarantee are so onerous they fly in the face of the reasons you would virtualise in the first place

I would have to assume that in his current role at EMC he is overly focused on virtualising high transaction throughput applications.

That pretty much wraps up my response to the original article, hopefully I've been able to clear up some misconceptions out there about our storage guarantees. I'd be interested in any comments you might have

 

John Martin
ANZ Consulting Systems Engineer
Data Protection and Retention

 

before I sign off there are a few final points I'd like to make

 

Data Intensive Workloads


If you are looking for storage efficiency for high I/O throughput environments, you should notice that every benchmark we do uses RAID-DP vs most competitors benchmarks that use RAID-10. Even then if you divide the number of IOPS by the number of spindles NetApp almost always come out in front. With our extremely efficient ways of doing writes via WAFL, and the benefits of random read acceleration with our Performance Accelerator Modules, our leadership in this area is likely to continue.

 

Snapshot Space


One thing that we don’t make a big deal about with the storage guarantee is that it includes space for array based snapshots, something, that to the best of my knowledge, nobody else does when sizing VMware storage. This provides the basis for a whole range of advanced data protection strategies such as SRM. Not only do you get better storage for less money with NetApp, but you also have elegant ways of solving backup and D/R problems, which are some of the most challenging areas of server virtualization.

 

Slimy Behavior


Jared Floyd from Permabit made a number of assertions in a comment on the article this post addresses. I thought I'd talk to some of the bolder statements here. His comments are in italics, my responses follow.

"This is pretty slimy behavior on behalf of NetApp, but I can't say that I'm surprised given that this was a marketing stunt to begin with."

OK, so we have great technology that gives real customer benefits to a large number of our users, we’re prepared to stand behind this with a guarantee with reasonable terms and conditions (same as any guarantee) and we publicise it. You say marketing stunt, I say highlighting unique customer value. If a customer doesn’t like the terms and conditions, or they don’t think they’ll apply to their environment, then they don’t have to buy our kit, or they buy it without the guarantee and still get compelling value. Its pretty much a no-lose proposition for the customer, so how exactly is this slimy ?

"It's unlikely that a vendor is going to offer a guarantee program where they might have to pay up"

Um, yes, that’s true of pretty much any guarantees or warranty, I doubt that Sony offers a 2 year guarantee (or whatever it is) on their electronics because they think it will fail before then. In fact they probably think it will last at least 5 to 10 years, NetApp is no different here. Where we are different is that many vendors make claims about how efficient their storage is, we actually stand behind our claims of efficiency with a legal remedy. Permabit and many others make claims on their website about how much they can save, and yet, as far as I'm aware, they provide no guarantees. Why ?

VM system images are causing an explosion in networked storage requirements; they let you make much more effective use of your server resources but lead to storing the same OS files over and over again

Yes ! this is exactly the problem that NetApp’s dedup for FAS addresses particularly well. The redundant data is only stored once, and this is a big part of our storage guarantees. For some strange reason it seems that you think we have no solution for this and then criticize us for this lack. By the way, exactly how does Permabit (or anyone else other than NetApp and VMware) help customers solve this problem ?

NetApp is severely behind in the dedupe game

Say again, over, I don't think I heard you correctly ...

 

  • We’re the only company offering deduplication for data residing on primary storage including VMware
  • We are the only company providing compelling deduplication savings for all forms of server virtulisation
  • As far as I know we have more customers running block level deduplication than the rest of the market combined and I strongly suspect that we have deduplicated more data than the rest of the dedup vendors combined

     

  • We have developed and provide two deduplication technologies (for FAS and VTL) each of which addresses a different use case. Our customers love our dedup technology, and as a result our leadership extends every month.

  • We are the only company offering this kind of guarantee on space savings.

 

How exactly is this being “behind ?”

Our Permabit Enterprise Archive product was designed from the ground up to support dedupe

Jolly good for you, well done, I’m happy for you, however from a quick look at your website, it appears that you’ve built an entire product around one feature and then use it to push the archive barrow. You might be interested in something I wrote on archive in my blog on the netapp communities forum http://communities.netapp.com/people/martinj/blog/2009/02/10/why-archive.. I’d be more than happy to debate the point there, and if you promise to raise the tone of your comments, then I’ll promise to do the same

Comments

Filter Blog

By date:
By tag: