Currently Being Moderated
valb

Erasing misconceptions around RAID & Erasure Codes

Posted by valb in Exposed on Dec 11, 2012 7:01:57 PM

** This following is a GUEST POST by Peter Corbett, Vice President & Chief Architect of NetApp (aka "Father of RAID-DP") **

 

There has been a large amount of interest in recent years in erasure codes.  The common class of erasure coding algorithms that people normally associate with the term includes algorithms that add a parameterized amount of computed redundancy to clear text data, such as Reed Solomon coding, and algorithms that scramble all data into a number of different chunks, none of which is clear text.  In both cases, all data can be recovered if and only if m out of n of the distributed chunks can be recovered.  It is worth noting that, strictly speaking, RAID4,5,6 and RAID-DP are also erasure coding algorithms by definition.  However, that is neither here nor there – XOR parity-based schemes have different properties than the “new” algorithms being used in some systems that are what people are thinking of when they talk about “erasure codes”.

ErasureCodes-Figure1.png

(Credit: http://nisl.wayne.edu/Papers/Tech/dsn-2009.pdf, Page 2)

 

Use-cases for each

Erasure codes are being used for deep stores, for distributed data stores and for very scalable stores.  They are commonly used in RAIN systems, where the code covers both disk, node and connectivity failures, requiring data reconstruction after a node failure (instead of HA takeover which is much much faster).  These erasure codes are more computationally intensive both on encode (write) and decode (reconstruct) than xor parity-based schemes like RAID-DP.  In fact, one of the big motivations for developing RAID-DP was that the “industry-standard” Reed-Solomon code for dual-parity RAID-6, as well as the less widely used Information Dispersal algorithms to protect against more than one failure, are more computationally intensive than RAID-DP.  RAID algorithms are also very suitable for sequential access in memory, as they can work with large word sizes.  Many of the complex erasure codes are based on Galois Field arithmetic  that works practically only on small (e.g. 4, 8 or 16 bit) quantities, although there are techniques for parallelizing in hardware, on GPUs, or using the SSE* instructions on Intel architecture processors.

XORRDP-Figure2.png

(Credit: http://nisl.wayne.edu/Papers/Tech/dsn-2009.pdf, Page 2)

 

 

Erasing Limitations

Basically, an erasure code only works when you know what you’ve lost.  It provides sufficient redundancy to recover from a defined number of losses (failures).  If you don’t know what you’ve lost, you need an error detection and correction code, which requires a higher level of redundancy.

 

For disks, the known loss failure modes that we can see are disk failures, sector media failures and platter or head scoped failures.  All of these require reconstruction from redundant information, which in our case is dual parity (row and diagonal parity covering all double failures).  There are other failures that can invoke reconstruction, i.e. connectivity failure to a disk shelf.

 

However, there are other modes of disk failure that are more insidious, and that require some way to determine which data is bad from a set of data that the disk sub-system is claiming is good i.e. silent errors.  To cover those cases, we add metadata and checksums to each block of data.  If we detect a problem there, we can then use parity to reconstruct.   So, just saying that you have an erasure code that protects against more than two losses is not sufficient to claim superior protection of data.  You have to have worked through all the failure modes and make sure you can protect against those failures.  We’ve hardened ONTAP over nearly 20 years of existence to provide a very high level of resiliency against all modes of disk failure in combinations.

 

That said, we are very aware of the trend to larger disks and the impact that has on reconstruction times and on the probability of higher order multiple failures.  While we can’t disclose plans and roadmaps publicly, we do have a good understanding of this area that will allow us to continue to meet the reliability needs of future FAS & E-Series systems as well as future disks and solid state media.

Comments

Filter Blog

By date:
By tag: