There are many ways to make your IT infrastructure more efficient and effective. Consider storage or data tiering as a way to drive efficiency in your data center. Although that seems to be a reasonable technique for IT departments to implement, why has establishing storage tiering been such a struggle?
As a concept introduced by IBM in the early 1980s, storage tiering was known as hierarchical storage management (HSM). In the 1990s, it was called information lifecycle management (ILM). Now, in its latest incarnation, it is referred to as automated storage tiering (AST), with the emphasis on automated, a concept introduced by Compellent in 2005.
Although the name has evolved, the fundamental principle is the same: Storage tiering is based on the outdated and operationally inefficient methodology of moving data from high-performing, very expensive media to high-capacity, less-expensive media, with the goal of maximizing the utilization of the storage infrastructure. But whatever its current name, storage tiering is a project that IT organizations never seem to get done.
The problem is that most storage tiering solutions available today are overly complex, cumbersome to implement, and still based on an antiquated methodology. See figure 1.
The basic premise of storage tiering is about storing data in the right place, and at the right time and price, to support the enterprise. It’s about the efficient utilization of the storage infrastructure.
Underpinning the premise of storage tiering are three basic assumptions:
• The value of data decreases over time; according to some estimates, data not accessed within 90 days will almost never be accessed again.
• It is estimated that less than 20% of all data needs to be on high-performance (and therefore expensive) media.
• It’s a reasonable strategy to move as much data as possible to high-capacity drives (SATA) as soon as possible as a way to better utilize the storage infrastructure in the data center.
So how do today’s automated tiering solutions attempt to address these three points? Let’s start by focusing on the word "automated"; it sounds right, and it’s very appealing—but does it work? It’s easy to equate “automated tiering” with “intelligent tiering”; but before IT departments can “automate” anything, a tremendous amount of work needs to be done. The storage architect or administrator has to collect, analyze, and design the correct workflows so that the system can automate it—in other words, the storage architect has to do the heavy lifting.
For instance, consider the following questions that need to be answered in order to architect an appropriate solution:
2. How big should tier 1 be? How big should tiers 2, 3, 4 be?
3. How do I determine what data is hot, warm, or cold? Some vendors' implementation of automated storage tiering (for example, EMC), requires a good understanding of application workloads, additional software, and detailed planning and sizing of the different tiers of storage.
4. What kind of data should go in tier 1? How do I classify my data?
5. How long does it take auto-tiering software to migrate data to another tier? In some instances it takes 3 days to promote data and 12 days to demote it—really?
6. When is critical data promoted to tier 1? Keep in mind that data migrations or relocations can affect system performance; depending on the vendor, it could take hours to days.
7. When is cold data moved to tiers 2 and 3?
8. Is the data migration process manual, automatic, or scheduled?
9. How granular is the data migration? Do I need to move a whole LUN? a sub-LUN?
10. How do I know if I have the right data migration policies, thresholds, or time windows for data movement? Ongoing monitoring or calibration will be required.
11. Can I use data efficiency features like deduplication and thin provisioning in my tier 1 storage layer?
12. Do I need different tiering solutions for NAS and for SAN?
13. How many new tools and management end points will these add to my environment?
14. And perhaps the most important question of all: How much will the new solutions cost? How many licenses will I have to purchase?
Architecting the right solution depends on making sure that these questions get answered correctly and that you’ve collected and analyzed the correct data. In the end, even if your data and analysis are correct, the actual implementation of the solution may be too complex. For example, storage tiering solutions from vendors like EMC’s Fully Automated Storage Tiering (FAST) is a collection of things, not a specific capability. FAST on Symmetrix is different from FAST on CLARiiON, on Celerra. FAST is an umbrella term for a group of point technologies that act differently on every platform. In other words, the success or failure of implementing AST requires painstaking work up front and constant feeding and care in the operations phase. It assumes predictable workloads, and there is little room for flexibility.
At NetApp, we’ve decided to take an approach that’s different from the traditional storage tiering approach. As NetApp CEO Tom Georgens said in a phone call with analysts, “The entire concept of tiering is dying. The simple fact of the matter is, tiering is a way to manage migration of data between the Fibre Channel-based system and SATA-based systems. With the advent of Flash, basically these systems are going to go to large amounts of Flash, and that will be dynamic with SATA behind them.... The whole concept of tiered storage is going to go away.” To validate and reinforce the view that Flash is playing a central role in simplifying and changing the storage tiering paradigm, Jeremy Burton, EVP, Product Ops and Marketing, EMC, said recently (Oracle Open World keynote 2012) that “A little bit of Flash goes a long way,” and that according to EMC’s findings, if customers were to deploy about 1% of their storage capacity in Flash, that would serve over 50% of the IOPS.
We couldn’t agree more: Flash is making the concept of storage tiering irrelevant.
So what is the NetApp approach?
NetApp® Virtual Storage Tier (VST) is a self-managing, data-driven service layer for storage infrastructure. VST provides real-time assessment of workload priorities and optimizes I/O requests for cost and performance without the need for complex data classification and movement. VST is a simple and elegant approach to a perennial problem that IT organizations would like to check off their to-do lists.
VST promotes hot data without the data movement or migration overhead associated with other approaches to automated storage tiering. Whenever a read request is received for a block on a volume or LUN where VST is enabled, that block is automatically subject to promotion. (4KB blocks are very granular, compared to other implementations.) Note that promotion of a data block to the Virtual Storage Tier is not data migration, because the data block remains on hard disk media when a copy is made to the VST.
VST leverages NetApp’s key storage efficiency technologies (deduplication, volume cloning, thin provisioning), intelligent caching, and simplified management. You simply choose the default media tier you want for a volume or LUN (SATA, FC, or SAS). Hot data from the volume or LUN is automatically promoted on demand (application driven) to Flash-based media.
In summary, the NetApp solution offers:
• Fewer tiers of storage (FC or SAS plus Flash or SSD or a combination)
• Intelligent placement of data (no data migration, no disk I/O consumed)
• Acceleration of the adoption of SATA HDDs (incorporate SATA media earlier in the data lifecycle)
• Application driven (the application and Data ONTAP® drive the promotion of hot data)
• No rules, templates, profiles, or complicated workflows
As you can see, achieving an efficient and effective storage infrastructure doesn’t have to be complicated, elusive, or out of reach. Automated storage tiering is a dying concept because it just doesn’t work. With the advent of SSDs and Flash technology, there is a new, better, and quite exciting way to virtually tier your data and storage. Goodbye AST, and yes FAST too!