Storage tiering has been around since the dawn of computing. Data center storage architects have always tried to store and retrieve data as quickly as possible for their brethren, but due to financial constraints, not all data can be accessed at breakneck speeds. The economic realities of IT led to the concept of storage tiering - meaning different types of storage devices (with different speeds and costs) could be used to store data in “tiers”, depending on how fast people needed it.
One of the earliest attempts at storage tiering, back in the 1970s, was IBM’s System Managed Storage (SMS) and their idea that data should be located in one of three places:
- Online disk uncompressed (fast but expensive)
- Online disk compressed (a little slower but a little cheaper)
- Offline tape (slowest and cheapest of them all)
IBM called this concept “hierarchical” storage and had some success with SMS in their mainframe. In the 1980s a company called Epoch Systems tried the same concept for Unix servers, but with a few twists. First of all they came up with the idea of a migration manager that would save a small “stub” file on the fast disk drive, and point to a file migration location on a slower, cheaper device. Second, the files would be migrated to a new and exciting technology - Magneto-Optical (MO) jukeboxes, which held the promise of greatly expanded capacities, costs much lower than expensive disk drives, and performance far greater than tape drives. Unfortunately, these promises were never fulfilled and after being acquired by EMC in 1993, Epoch and the idea of Hierarchical Storage Management (HSM) faded into oblivion.
Enter the 2000s, and a new disk drive technology that was emerging. SATA drives were based on PC-grade ATA drive technology, but were proving to be quite cheap and fairly reliable. Slower than their 15K RPM Fibre Channel cousins, SATA drives could hold a much higher storage capacity and therefore sold at a much lower cost per GB. As a result, the next incarnation of storage tiering meant that you could buy fast (Tier One) storage arrays, based on 15K Fibre Channel drives, for your really important applications. Next you’d buy some slower (Tier Two) storage arrays, based on 7K SATA drives, for your not-so-important applications. Finally you’d buy a (Tier Three) tape library or VTL to house your backups.
This is how most people accomplished storage tiering for the next decade, with slight variations. For instance, I’ve talked to some customers that had as many as six tiers when they added their remote offices and disaster recovery sites – these were very large users with very large storage requirements who could justify breaking the main three tiers into sub-tiers. Whether you categorized your storage into three or six tiers, the basic definition of a tier has historically been a collection of storage silos with particular cost and performance attributes that made them appropriate for certain workloads.
Recent developments, however, have changed this decade-old paradigm. Evolving storage array intelligence allows the automated placement of “hot” data without human intervention, and without the need for dedicated storage silos. The advanced intelligence of today’s storage arrays came none too soon, as the emergence of Flash-based PCI cards and SSDs demanded yet another tier, sometime called Tier Zero, with performance outpacing the fastest traditional disk drives, but costs driving Flash into an area where only the fastest applications could justify the high cost of these devices. A good tiering mechasism was sorely needed, one that could utilize Flash for just the high performance "hot spots" without breaking the bank.
EMC re-entered the storage tiering market in 2009, with the announcement of Fully Automated Storage Tiering, or FAST. NetApp also entered the tiering market in 2009 with its Performance Acceleration Module, or PAM, announcement, which eventually became known as FlashCache, part of the NetApp Virtual Storage Tier. Released within months of each other, the EMC and NetApp solutions both addressed automated storage tiering, but in wholly different ways. Below is a chart that compares major design aspects of the two technologies.
When comparing the design approaches of EMC and NetApp, there are many clear advantages to the VST design, in fact some of the EMC design choices are just plain puzzling.
- Placement of new data. EMC chooses to use a “store high” approach, which guarantees unnecessary performance overhead as lightly-used data is demoted to a lower tier.
- Configuration requirement. Not all EMC LUNs can be immediately tiered. If they are not configured in LUN pools, a migration is required.
- Tiering granularity – EMC tiering is done in large 1GB slices, 250 times larger than NetApp’s 4K blocks. Tiering in large chunks is easier to accomplish, but a less efficient use of space.
- Tiering regularity and overhead – EMC’s nightly batch processing cannot be done during performance intense workloads, which is exactly the time when tiering should be done.
I generally advise comparing all vendor solutions and making your own choices based on your specific criteria. Given the significant differences in design between EMC and NetApp storage tiering approaches, this advice remains.
Data Storage Matters,