NetApp’s foray into Flash has been a bit controversial. Never one to bow to peer pressure, NetApp has forged a different path from other enterprise Flash providers – focusing on software intelligence to maximize the efficiency of Flash. NetApp’s Virtual Storage Tier, or VST, and its use of Flash Cache, Flash Pools, and Flash Accel have all contained breakthrough ideas that put more smarts into otherwise dumb memory modules.
In this blog post, I’ll describe some of the inner workings of NetApp Flash Cache technology. Before I begin, I should mention that some people have told me that this isn’t important, and Flash is just a commodity that no one cares about, like disk drives. Well, call me old fashioned, but I like to know how stuff works. Aside from the widely discussed software intelligence in VST, what sort of tricks does our Flash Cache design contain? If you are like me and enjoy peeking inside, in search of interesting design elements, then read on.
A Glimpse Inside the Cache
Flash Cache was NetApp’s first entry into Flash, introduced in 2009. Flash Cache modules are 3/4 –length PCIe cards that tuck nicely into FAS and V-Series storage controllers. First gen Flash Cache modules contained 16GB of DRAM (these were marketed as the Performance Acceleration Module), followed by 2nd gen modules with 256GB of Flash, 3rd Gen modules containing 512GB of Flash, and finally the current 4th gen module loaded with – you guessed it - 1TB of Flash. A maximum of 8 Flash Cache modules can be installed in our biggest FAS or V-Series controllers for a maximum per-controller Flash capacity of 8TB.
In all generations of Flash Cache, data flow is managed by an onboard FPGA. This FPGA controls all communication between system main memory and the Flash memory located on the Flash Cache board. The FPGA was designed purely for speed, and provides some elements of Flash magic - for example:
- Each write to Flash requires an erasure, and erase cycles are very slow, so it’s best to have as many going in parallel as possible in order to maintain throughput. To accomplish this, the Flash Cache FPGA intelligently interleaves writes throughout multiple write queues, resulting in balanced Flash erase, write, and read cycles.
- The Flash Cache FPGA supports multiple memory interfaces, with each interface going several banks deep. When one flash bank on an interface is busy (i.e. undergoing an erasure cycle), the FPGA can issue a command to another bank on the same interface. This prevents stalls when too many requests bunch up on too few banks of Flash memory.
- Within Flash Cache, the FPGA does not read from flash cells individually, but rather in groups that are striped across multiple Flash banks, a technique that reduces read latency by over 800%.
At the surface, all Flash looks alike. Under the surface, however, design decisions can make all the difference. Using the techniques described above, NetApp systems employing Flash Cache have displayed throughput in excess of 250,000 IOPS with sub 1ms latency (using the SPC-1 SAN benchmark.) Flash Cache has also shown that 15K RPM high performance FC/SAS HDDs can be replaced with a far fewer 7.2K RPM high capacity SATA HDDs without sacrificing performance (using the SPECsfs NAS benchmark.)
For general information on Flash Cache, click here.
For more information on SPC-1 benchmark Flash Cache numbers, click here.
For more information Flash Cache/HDD reduction as demonstrated in SPECsfs benchmark data, click here.
Data Storage Matters,