If you've come here from Isilon's "guarantee" website, please be aware that they've got the figures wrong for NetApp's space efficiency.
+Very wrong. <em>The whole series on NetApp's space efficiencies is here,</em> and it clearly shows that NetApp's space efficiency is much higher than Isilon's miscalculated claim. </strong></p> <p>*!http://blogs.netapp.com/.a/6a00d8341ca27e53ef01157088b78c970b-pi|height=149|style=border-bottom: 0px; border-left: 0px; margin: 0px 0px 0px 10px; border-top: 0px; border-right: 0px|alt=guarantee|width=170|align=right|src=http://blogs.netapp.com/.a/6a00d8341ca27e53ef01157088b78c970b-pi|border=0! If you really want to save money and be storage efficient, you need a storage system that gives you more space than you bought, by using advanced data deduplication, double parity RAID, and a number of other space-saving technologies. *<strong>Here, try a real guarantee . </strong>*Accept no substitutes!*</p> <p><strong>Thanks for visiting, and enjoy the blog.+
*Part 1 *Three Men Make a Tiger*<br>Part 2 **Space is Mind Bogglingly Big*
Scientists have puzzled for years over the question; why does space appear to only have 3 dimensions? Some speculate that space has more, with tiny curled up dimensions that can't be seen; up to eleven at last count.
Back here in IT land, we don't even have three. Poor old disks and tapes have to do with one dimension; lines of bits laid out on a rusty glass platter or a strip of mylar.
!http://upload.wikimedia.org/wikipedia/commons/5/55/8-cell-simple.gif|height=154|style=margin: 10px 0px 10px 10px|alt=Image:8-cell-simple.gif|width=154|align=right|src=http://upload.wikimedia.org/wikipedia/commons/5/55/8-cell-simple.gif|border=0!But users of this one dimensional space have come up with a solution; map another dimension over the top of the single stream of bits. Structures like filenames in directories and volumes and LUNs give us a* second dimension* to our data.
The third dimension? (OK, this is all for effect, but bear with me.) Differentiate the second dimension by adding a third dimension; qualify and assign a type to the second dimension, and identify that file as a spreadsheet or a document or a database.
Here at NetApp, we've discovered the fourth dimension.
h3. Non Duplication
There's a lot of buzz in the industry about deduplication, but what many fail to understand is that it's possible to not duplicate data in the first place. It's what a colleague of mine, Mike Riley, called our* non-duplication technologies.* They add a fourth dimension to the data.
An example. NetApp snapshots are famous for two reasons.
Here's the latest set of figures from the NetApp AutoSupport data discussed in my previous blog entries. (Sorry, it's a picture. Inserting tables is next to impossible.)
UPDATE: corrected, 83TB is now 83PB
* !http://blogs.netapp.com/shadeofblue/WindowsLiveWriter/MyTesseractBeatsYourCube_CDC4/image%7B0%7D_thumb%5B1%5D.png|height=142|width=457|src=http://blogs.netapp.com/shadeofblue/WindowsLiveWriter/MyTesseractBeatsYourCube_CDC4/image%7B0%7D_thumb%5B1%5D.png! </p> <p>The used snapshot space is less than 3%* of the total usable space. Which demonstrates how conservative NetApp are with recommendations of how much space should be reserved for snapshots; these systems reserved 13.4PB, much more than the committed space of 2.4PB. For a fuller description, Val Bercovici details the flexibility of NetApp's snapshot reservations.
How much data does that represent? Multiplying the snapshots by the size of the volumes that have snapshots, it's a whopping 1,511PB of non duplicated data.
h3. Effective Load Factor
How should we describe this non-duplicated data that has the appearance of being a much bigger set of data? I like to think of this in terms of load factor. Worst case is to compare with the raw disk space. The data we are storing is well in excess of 1,500PB on systems with an total capacity of 217PB.
*That gives an effective load factor of over 7 to 1 for these systems. *Considering just the snapshot data, it's a load factor of 700 to 1.h3. Bottom line
It really doesn't make sense measuring usable storage with a micrometer. With 4th dimension non-duplicating technologies like our SnapVault, SnapMirror, Thin Provisioning, and Snapshots / FlexClones, the effective size of your data cube has just -- dramatically -- increased. And all without hitting performance.
It takes a few %age points of the total disk space to enable this truly smart technology.
Worth it? You bet. NetApp customers would find it really hard to go back to the dark ages of storage. Snapshots are so natural for NetApp users that many don't think about this extra dimension, and those unfamiliar with virtualized storage often fail to grasp the difference between NetApp systems and a traditional SAN.
h3. What's Not Shown; Deduplication
All the savings above are effected by non duplication technologies. Not shown are the further savings to be made with deduplication. That increases the effective load factor further; by how much I can't tell, as deduplication depends on the data, and I don't have deduplication statistics in the set I'm working with.
But it's substantial; with VMware for example, deduplication can show 80 to 95% space savings.
h3. Dr Dedupe
If deduplication on primary storage is of interest, go visit my colleague Dr Dedupe's blog for more insight into this technology. Interestingly, he was rated in the top 10 of most valuable vendor storage blogs over at Storage Monkeys.
I (of course!) was at #11, and not in the list. Next year's Oscars, perhaps.h3. Sunny at ShadeOfBlue Towers
Apologies for the lateness of the this part of this series. The weather was so good this weekend, I decided that a few days in the sun at the weekend took top priority. We see the sun so infrequently here.