Currently Being Moderated

The State of Deduplication - 2012

Posted by lfreeman in Ask Dr Dedupe on Oct 26, 2012 1:31:01 PM

Today marks the final blog for Ask Dr Dedupe.  After 5 years and a hundred or so posts, deduplication has become commonplace, meanwhile there are many other emerging and interesting data storage technologies to explore. As such, take look at my new blog: “About Data Storage” where I’ll be discussing all aspects of emerging data storage technologies.


So, for this, my final blog, I thought it would be interesting to take a look at the current state of array-based deduplication in the industry, by both the incumbents and the startups.  First, I’ll summarize six dominant storage vendors that provide over 80% of all networked enterprise storage to the world today:



Dell acquired Ocarina in 2010 and adapted their technology into a pseudo-dedupe offering.  I use the word pseudo because I still have doubts about whether or not image compression constitutes true deduplication.  But they advertise it as such, so I’ll give them the benefit of the doubt.  The Dell DR4000 is Dell’s only storage array with any type of deduplication capability.

Dedupe Grade: D



EMC’s product line is a conglomeration of 4 architectures: so we’ll take each one separately:


  1. Symmetrix V-Max – no deduplication currently offered.
  2. Isilon – no deduplication currently offered.  In an interesting side note, one Isilon blogger pondered whether deduplication is just a fad (like rock and roll?)
  3. VNX – deduplication of static files only.  This seems to be a half-hearted attempt at deduplication for primary or archival data.  Performance intense and not recommended for busy systems.
  4. Data Domain – Full deduplication offered.  As any old-timer like me might remember, Data Domain invented the concept of deduplication-embedded storage arrays.  Data Domain products are wholly focused on D2D backups, and they are the market leader in the space. Extra Credit granted to EMC for Data Domain's contribution..

Dedupe Grade: C


Hitachi Data Systems

It’s a bit difficult to keep up with the revolving door of storage arrays offered by HDS, but I’ll take a shot here:


  1. Virtual Storage Platform –No mention of deduplication.
  2. Unified Storage VM  - No mention of deduplication
  3. Unified Storage VM 100 – No mention of deduplication
  4. Content Platform – No mention of deduplication
  5. NAS Platform Family (BlueArc) – No mention of deduplication
  6. Adaptable Modular Storage 2000 Family – No mention of deduplication


Dedupe is apparently not spoken at HDS

Dedupe Grade: F



HP announced StoreOnce deduplication in 2010.  In the announcement, they offered a promise of dedupe everywhere with no need to ever rehydrate data once it’s stored (hence the name StoreOnce).  Unfortunately, this idea didn’t even sound good on paper, and with HP’s deduplication cornerstone being sparse indexing, was impossible.  Predictably, HP has relegated deduplication to D2D backup appliances only.  HP’s rising star, 3PAR Utility Storage, offers no indication of ever bringing deduplication into its portfolio.  Despite the early hype about dedupe everywhere, it appears that this idea remains locked in the minds of HP.  Because they promoted false dedupe hopes, HP receives a one grade point deduction.

Dedupe Grade:  D



IBM has taken a hybrid approach to deduplication, and, to their credit, has published a detailed document describing their dedupe strategy.  For D2D backup, IBM includes deduplication with the Diligent ProtectTIER backup appliance.  For NAS, the N-Series appliance, OEM’d from NetApp, naturally contains all the dedupe attributes of NetApp, discussed below.  For primary storage, IBM acquired Storewize in 2010 and apparently decided that compression is their preferred route to efficiency in primary storage.  The midrange V7000 includes StoreWize real time compression but no deduplication.  IBM’s high end storage arrays (SONAS, XIV, and DS8000) do not include either compression or deduplication.

Dedupe Grade: C



NetApp continues to be the only major storage provider offering fully featured deduplication across its entire line of general purpose storage systems.  All FAS and V-Series arrays share the same deduplication architecture. Dedupe’d data can be efficiently moved between arrays, either by transferring deduplicated data intact or by automatic re-deduplication after transfer.  Third party SAN systems, including the ones mentioned above, can be dedupe-enabled with the V-Series Open Storage Controller.  NetApp deduplication operates seamlessly regardless of storage protocol, application, or media type.

Dedupe Grade: A


So, for the big storage incumbents, it appears that the final course has been set with regards to deduplication.  NetApp has it, Dell, EMC, HP and IBM sort-of have it, and HDS doesn’t have it at all.


Next, let’s look at the plethora of emerging storage array startups.  All of these companies have a tiny combined market share, but of course all are vying to be the next big thing in storage.  Here, the story is a little different, as these vendors all offer deduplication (or have plans to include it.)  I won’t attempt to rank these companies, since viability can be a fleeting entity with any startup.  Instead, I’ve included a URL link that best describes each company’s commitment to deduplication.


Greenbytes – VDI-focused storage

Nimbus Data – Flash-based storage

Pure Storage – Flash-based storage

Skyera – Flash-based storage

Solidfire – Flash-based storage

StorSimple – Cloud-integrated storage (recently acquired by Microsoft)

Tegile – Virtual storage

Tintri – VM-aware storage

Violin Memory – Flash-based storage

Whiptail – Flash-based storage


These 10 storage array startups all include deduplication in their portfolio.  This should be a wake-up call to any incumbent storage vendor that doesn’t, or has limited capabilities.  History tells us that some of these startups will fold, some will be acquired (already happening), and others will rise to the top.  In the next generation of storage technology, deduplication will be requiste, not desirable.


In short, the state of deduplication is that it has proven itself to be viable in the data storage industry.  Now it’s up to all data storage vendors to prove that they can deliver it.


Signing Off-




Filter Blog

By author: By date:
By tag: