1 2 3 6 Previous Next

Storage Efficiency Insights

76 Posts

I was a psychology major in college.  I picked up Computer Applications only because I had to fill my schedule with electives and it seemed like a good fall back plan.  A buddy of mine said I would at least give myself an option to get a job in a hot field and I could always go back to grad school or law school.  Turns out I did go back to school (business) but I'm still fascinated by people and why we make the choices that we do. 


Technology is a choice - how much or how little you want to consume of it.  I've seen a number of articles talking about how human behavior is shaped by technology.  Hard to disagree with their specific points but, it's way too Pavlovian for me.  Although, walk into a conference and you'll see highly trained  technologists react (almost sub-consciously) to every bleating whistle, tweet, bell, and ringtone.  Might as well set up drool buckets.  In spite of this, I don't think iPhones or laptops or tablets shape our behavior.  They're clearly a response to how we would like to behave.  Seems like a no-brainer but most of the articles I read talk about the merits of the technology.  Little or no time is spent talking about if people would feel the technology offers a personal benefit to them.


Example: As an SE I thought I had just given my best presentation on the glories of Snapshots - ever.  I had hit all the value props - fast, zero performance hit, instant restores, space savings.  It was kind of a "ready, fire, aim" type of presentation so we circled back around to what was really the issue.  Turns out, the customer had spent the last two weekends and the previous night wrestling with restores.  His wife was getting pretty ticked off that the job was impacting home life.  All the customer wanted was a solution that could get him home on time.  Everything I had said up to that point: no impact.  I never connected the dots between the technology and his motivation.  I could have spent the last 45 minutes talking about windmills as long as the punchline was "and it will make sure you get home to the family early." 


I'm not discounting analysis on technology trends, specifications, TCO, benchmarks, etc. I think good ideas succeed when they address human motivation.  When I joined NetApp (Network Appliance) the catch phrase used to be "Fast, Simple, Reliable."  I still like that because it was very easy to make a connection with the customer.  Nobody wanted slow, complicated and unpredictable.  It was easy to plug in to the sentiment that if you can make my work life easier and less complicated; that it would do what you said it would do, I would have time to do what I really loved to do - whatever that was.


So, with that in mind, here are a couple technologies I think are good ideas doomed to fail:

  • Hard Drives - great idea.  Fantastic idea.  They have served us well for decades.  Flash changes everything and the biggest casualty will be hard drives.  I don't think they will go completely away. There's always the consumer market!  But, hard drives are about as reliable as they are going to get.  Data integrity isn't keeping pace with increased density.  Random access times haven't changed. You don't see storage too many storage startups using HDD platforms anymore.  The number of flash vendors has quintupled over the past few years.  Enterprise storage will soon be all-flash.  It's not because the technology is cool.  It's a cost thing.  Why pay more for less?  Unless we start thinking that hard drives are a cool vanity purchase, HDD are on the way out. 
  • Automated Storage Tiering -  First, flash kills ATS as we know it today.  Why tier disks if all you have is one tier: flash?  Caching still makes sense.  ATS does not.  Second, like all of its ILM predecessors, ATS sounds really cool on paper but at the end of the day people will take a pass.  We can't be bothered.  It's an atomic ice cream scoop.  It's football season and I'd rather watch a game than figure out tiers of storage or read through a few hundred pages explaining how to carefully set up what is supposed to be the technology equivalent of self-leveling cement.  If it were just as easy as "just add water" and pour, great.  It's not.  And, the savings just aren't there.  Storage depreciates faster than you can tier.  Eventually people get tired of generating reports justifying a decision that should have been self-evident.  That's usually when you see someone belch forth a TCO report.  Anyone download a TCO report from any vendor on any technology that says their technology is a bitch to manage and costs more in the long-run?  Nope - they don't exist.  Tiering won't compete with the lower cost and "Fast, Simple, Reliable" message of flash.
  • Thin Provisioning - we're lying to ourselves. I know that's not in the marketing literature anywhere but, that's all this is.  It's technology fiction.  We've invented a technology to help us get over trust and honesty issues.  We're essentially lying to our users in order to drive utilization up.  At some point we get tired of lying to ourselves.  The truth is so much easier to keep track of.  We'll eventually get tired of this.  Someone at the top will just say, "Look.  If utilization has been at 80% for the past three years and nobody seems to be worse for the wear, can't we just give people what they actually need and report on what they actually use?"


Just a few ideas that I think will go the way of Betamax.

VFCache – Man, am I conflicted over this announcement.  On the one hand, I applaud it.  Here you have a market leader addressing a trend (host-based flash cache) with a lot of potential for customers.  That’s great. That’s what you want to see out of your vendors.  On the other hand, if I net out the actual product (and EMC people, I can stand to be corrected but it’s all I could find as far as technical details go), I come up with this:


If you have a FC-only SAN (any SAN; no unique EMC array value here), non-virtualized, high random read, small working set application where cache coherency/data integrety isn’t a concern, then a proprietary VFCache card (limit one per server) is for you!


Wow - there’s lowering the bar for market entry and then there’s just laying it on the ground so you can step over it. Even with all of the app hype in EMC’s presentation, I was hard-pressed to come up with a good use case. 


I even got a good chuckle with the Thunder pre-announcement.  In a rare vendor triple lutz, EMC announced VFCache in paragraph one and pretty much gutted it with the Thunder announcement in paragraph two.  That had to be a new land speed record for obsolescence. If not obsolescence, it will be interesting to see how EMC stitches this all together in the coming year.  But, it’s pretty clear that there wasn’t a lot of “there” there today.


Now – all that said – I still like the announcement.  I’m not crazy about a low/no-value v1.0 product as a conversation starter but, there is something to be said for having that conversation.  With all of the big brains running around inside NetApp, I sometimes wish we wouldn’t play things as close to the vest as we do.  Almost a year ago to the day, NetApp previewed its project Mercury at FAST ’11Chris Mellor picked up on it in The Register.  Other than a few other mentions here and there, you didn’t see a lot of press on Mercury from NetApp; not a lot of chest thumping even as it turned into a hot topic for customers.  I will say if you want to hear the latest-greatest on Mercury, you can ask your local sales team for an NDA meeting.  We’ve been sharing the info under NDA and as I’m sure EMC, Fusion I/O and others can attest, it resonates very well. 


Another interesting facet to the EMC announcement is the central role that caching is taking in its AST strategy.  Let’s face it, FastCache was meant to remedy the glacial data movement issue of FAST (and, quite frankly, as a reaction to NetApp’s Flash Cache).  However, once you’ve plugged in to a caching strategy, it’s easy to see the logical next steps: moving an intelligent cache closer to the point of attack.  We talked about the inevitability of a caching strategy in the blog Why VST Makes Sense and the next logical steps in The Next Step in Virtual Storage Tiering.  There's no question that intelligent host-based caching is a great next step and a logical extension of a VST strategy.  (Just wondering how long it will be before EMC adopts VST as a strategy?)


I actually think there is a balance that can be struck here.  I do think there’s value in promoting your ideas on how to best solve customer problems.  From that standpoint, I perfectly understand the EMC announcement.  But, I also think there’s value in delivering a solution that has practical value to a customer.  What’s practical about the VST strategy?  Well, the great thing about caching is it just works.  You don’t have to worry if caching works across protocols or if it supports advanced application features.  You wouldn’t even have to worry about which cache card, necessarily. Flash is hardware.  Hardware commoditizes and in the eyes of the customer this should be a good thing.  The key to a VST strategy - just in case EMC is looking for some messaging as it ventures down the caching path - is flexibility.  It's a consumer (vs. vendor) driven model.  It would be a brave new world for EMC but, as we have said before, one that is deeply embedded in the NetApp DNA.  For more detail, on how Mercury plays a role in the VST strategy, give your NetApp sales team a call.  Chances are, they'll bring it up for you.

So, I put a few miles on over the past couple of months covering cities like Minneapolis and Des Moines - not a big stretch if you're based in Chicago - but throw a swing through Sydney, Brisbane and Melbourne and you can put an ass whooping on your circadian rhythms. (And as a Irish/Polish kid who did most of his growing up in Kentucky, I can tell you I pretty much have no rhythm).  Anyway, I had a chance to see a ton of customers and partners.  It wasn't uncommon to meet with 20 customers in the span of only a few days.  Let me tell you, the top 3 topics of interest in this order were: 1) Flash - particularly flash on host.  More on that later in the week.  2) Big Data - mammoth opportunity for e-Series and upcoming ONTAP functionality and, wait for it... 3) Backup.  Really?!  Backup?!  Yes, backup.  Virtualization got bumped.  Primarily because I think all the kids are doing it now.  It's in all the Sharper Image catalogs and in-flight magazines.  You're basically in the dark ages if you haven't virtualized something and called it a Cloud.


The not-so-dirty little secret: after 40 or so years, we're still struggling with how to efficiently back up all of the data we accumulate.  It's not that we haven't been able to get it right after all these years.  We're just saving a ton of data and it's outstripping our backup processes.  I have good fun with the backup architects when I'm out on the road.  It's one of the most critical jobs in the data center and yet when you ask the question, "who wants to be the backup guy?" it's funny to look out into the audience and see how many grown professionals who hate to make eye contact.  There's no hands rocketing in the air - oh, oh! Pick me! Pick me!  Heck no.  Think about it, every time your phone rings the person on the other end is already pissed off.  Nobody ever calls you to congratulate on another day of successful backups.  No, they've lost something and they want it back NOW!  This is usually accompanied by some explanation that sounds a lot like a storage gremlin: "It was here yesterday.  I don't know what happened but I came back to my desk and it was just gone!"  Well, at least they gave you an approximate time and rough description of the data abduction.  The flip side of this is when Exchange goes down and you get the call from a ViP who not only knows when it went down but who was standing next to them at the time; they live on their Blackberry/iPhone; they're on the road and when can they expect to have their mail access restore?  (Would that be the right time to let the ViP know that the recovery SLO for Exchange is two hours and they'll have to wait like everyone else?)  Out of all the war stories you can tell, I bet some of the best are around backup and recovery.  With the volume, richness and "everything is Tier 1" aspects of data, the backup challenge continues to expand disproportionately.


Enter the age of snapshots and replication.  I think it was last year when there was this absolutely insane discussion in the blogosphere on whether a snapshot constituted a backup.  I didn't pay too close attention to it.  It mainly sounded like one group of vendors with crappy snapshots telling another group of vendors with useful snapshots that they couldn't count snapshots as a backup.  Huh?!  The customers I talked to thought the whole discussion was goofy.  In general: if you could restore from it, it was a backup.  The rest of the discussion would focus on a complete data protection plan: how often to take a snapshot? Should you vault or replicate the snapshots to a separate site?   When and where can I roll to tape?  How can you factor in a disaster recovery plan? What's the restore process look like? (File that under backups are worthless; restores are priceless).  90% of your time is spent revising and refining the existing backup/restore process. 


With that as context for backup, I'd say the vast majority of the feedback I've heard from customers is, "NetApp - love your snapshots.  Love your replication.  Please don't make me jump to another tool."  The backup team has enough headaches managing the ever changing data landscape.  However slick you think your homegrown backup/replication tool is, at the end of the day it's yet another vendor tool, another set of instruction manuals, leather bound best practices, and requirements for a separate server (or VM).  Best case: this is seen as a necessary evil.  I know that's not part of any marketing campaign anywhere: "Come check out our solution. It sucks less than what you're doing now."  But, I think we've made some significant progress on flipping this around and turning it into a true positive.  The strategy is to simply melt into the background and let you use the tools you already have on the floor.  You've seen the work we've done with Syncsort and CommVault (SnapProtect)?  Allow me to introduce Symantec's Replication Director in NetBackup 7.5! This is a great next step in trying to keep the backup/recovery/DR process as simple as possible.  Under the covers, NetApp and Symantec worked together so that Replication Director can leverage our storage service APIs.  What this means to customers is you  will be able to schedule NetApp Snapshots, create SnapMirror and SnapVault relationships, integrate a tape backup schedule, have NetBackup index and catalog all of these backup images and never leave NetBackup!  You want to create a Storage Lifecycle Policy (SLP) that includes NetApp Snapshots, SnapVault and SnapMirror?  Go ahead!  Figure out how you many snapshots you want to retain, when you want to vault them, how to replicate for disaster recovery and assemble it all using Replication Director in NetBackup 7.5.  Our technology is still in there.  You just don't need to jump back and forth between screens to manually coordinate anything.  You can still take advantage of all of the the ONTAP efficiencies e.g. deduplication, compression.  (We've seen customer savings ranging from 20:1 to 50:1 to 70:1 on full backups).  You can still take advantage of Flexclones for test and development work, reporting, disaster recovery testing, off-site tape backup operations, etc. 


I know, "melting into the background" is not a headline grabber.  It's not as sexy as talking about virtualizing 5000 desktops and storing them all on the head of a pin.  I get that. But, at the end of the day, everyone has to talk to the backup guy.  Everything has to have a backup plan and the easier we can make it on the backup team to pull together that plan, the better off we all are.  That's why I do think that backup is still part of nearly every customer conversation I have.  That's why I do find myself getting some excitement going talking about this solution. I'd actually like to see the backup team go home at 5:00 o'clock and not have to come in on weekends.  A man can dream.

Price per GB.  It's basically a useless metric for comparison.  I get it - it sounds like a standard unit of measure but it's not.  It's insufficient and flawed.  Insufficient in that it focuses on one variable within one vector of a business (capex).  Flawed in that it assumes 1 EMC drive = 1 NetApp drive = 1 IBM drive = 1 HP drive and so on. 


Another way to get at this cost question is through the mountain of spreadsheets, papers, graphs on usable capacity.  Another useless metric.  Usually, it's one vendor doing the usable capacity calculation for their competition.  Really?  How do you think that's going to turn out?  Do you think EMC is going to march into your office and tell you that NetApp has the lowest price per GB in the industry?  Probably has the same chance that HP is going to slide a TCO report across the table that says "we're unbelievably expensive and a bear to manage."  Those TCO reports simply don't exist.  I defy you to go to any vendor's web site and find a crappy TCO report (to be specific, on them - not on a competitor).   I can see how some of this stuff can help support your decision but probably not make your decision.


So, what are you left with?  How do you compare things?  First, a prerequisites: it's fair to assume that different companies have different approaches to the same customer problem.  Second, let me do a couple things here: a) freak out the NetApp marketing department b) take the $/GB argument off the table.  For the sake of argument let's say that NetApp has the highest $/GB in the industry.  I'm sure our competitors are happy with the thought experiment at this point (and Chris Cummings just choked on his Stanford necktie) but, bear with me for  a minute.


Here's how I look at it: we get side-tracked with a flawed question.  The question is how much usable capacity can each vendor squeeze out of the same quantity of drives?  This is usually supported with some silly graphic (again, provided courtesy of your favorite competitor) of a single drive and a breakdown of the associated "overheads."   That would be great if all you were going to do is buy a single drive from each of us.  But, that's not the case.  The premise of the question is each vendor can do exactly the same thing with the same set of disks.  We all dedupe the same, snap the same, clone, thin provision, RAID protect, perform - all the same and, that's simply not true. 


Many NetApp innovations do not depend on thinking of something first.  It’s that we made features simple and practical to use in production: snapshots, RAID-DP, dedupe, non-dupe (cloning).  For example, in virtualized environments performance actually improves when you use our non-dupe technology.  NetApp has done an excellent job of leveraging existing WAFL constructs to deliver these features to the market.  My favorite story was, as a new hire, I asked Dave Hitz how he thought of snapshots.  Dave said he didn't really think of snapshots, they were already there.  (WAFL takes a consistency point at least every 10 seconds.  A snapshot is simply a consistency point that is retained under a unique name vs. letting it roll off with at the next CP).  If you understand NetApp snapshots, then you understand WAFL's ability to share 4K blocks.  If you understand WAFL block sharing, you understand how easy it is for us to implement dedupe and cloning. Most all of these efficiency features were "already there" and didn't exact a performance tax. We just had to leverage the WAFL DNA to bring them to market. 


Many competitors have the opposite problem: the DNA of their traditional storage systems didn't lend themselves well to these advanced features.  Most of these features became after-market bolt-ons.  When you’re dealing with a traditional storage approach, all of these feature bolt-ons come with a steep tax, which typically shows up in performance (revenue) and/or manageability (opex).  For traditional storage vendors, it's like trying to run a race with a weight vest on. Even some of the relatively new stuff out there takes a heck of a lot of gear to produce comparable results.  Take a look at the latest NetApp and Isilon SPECSFS benchmarks. Dimitris Krekoukias did a great job of breaking that down for us


The bottom line is it should show up on the bottom line.  There's a set goal by the customer represented by their performance, protection, flexibility and availability requirements.  It's not that vendors can't meet those requirements.  It's what they would have to bid in order to properly meet those requirements.  How many extra disks or SSD would have to be bid to compensate for performance hits due to snapshots?  Is everyone following their own best practices for tiering?  Do they have enough SSD included? Would they go with mirroring or some version of RAID-6?  If RAID-6, how would they compensatefor a RAID-6 performance tax.  What about dedupe and non-dupe capacity savings?  Would it even be recommended as a best practice?  Would it be implemented natively within the array or as a separate appliance?


You can continue to go down the list; separate the RFP technology from the practical technology.  In the final analysis if you're chipping away at a handful of percentages on single drives and calculating a price per GB then don't forget to add the traditional storage multiplier.  Typically, we're seeing anything from 2X - 3X: NetApp $/GB = 2($/GB) big 3-letter company. 

A little perspective and you realize FUD is so stupid.


As some of you know, I help coach (American) football here in Illinois: 7th and 8th grade boys in the Catholic Grade School Conference (CGSC). It's a travel program; highly competitive.  We just wrapped up an exciting year going 7-3.  There was a lot of emotion after our last game.  I've coached some of the boys for the past four years.  After that last game, I think it hit the team how close we had become and what a great experience the season had been.  The 8th graders were realizing that this was the last time they would wear their colors and, for the entire team, this was the last time we would take the field together as a unit, as a family.  Each year is special. The kids always teach me more about myself than I feel I teach them about the game of football.


As a coaching staff, we don't teach a win-at-all-costs attitude.  Yes, it's an "earned time" league (i.e. no minimum play-time rules) but, we talk about how to compete with honor; how to support your football brothers; the importance of off-field community contributions; grades; and respect for your family and respect for the other team. Don’t get me wrong, when we cross the white lines, our kids will compete hard and play to the echo of the whistle.  Often times, you'll hear a coach - a lot of coaches - characterize this as "playing with heart."  In an odd twist of fate, reality met the metaphor this year. 


Early in the season, coaches took notice of an outstanding 7th grade athlete - incredible balance, quickness and hands - and just a phenomenal attitude. Right away we had him slotted in as a running back and defensive back.  There were some struggles early on with conditioning but who doesn't have that at the beginning of the year?  As the season began to ramp, this young man still had some struggles – said he didn’t feel quite right - but we figured that he would round into shape soon and he certainly looked good when he was at full speed! 


So, one evening I show up to practice and there he is tossing the ball around with his teammates but he's not dressed for practice.  Just like any of the boys I asked him how he felt.  "Fine!  Feel great, Coach!  Wish I could play but I have to go in for heart surgery this Thursday!"  Whoa!  I didn’t expect to hear that!  You coach 13 and 14 year old boys long enough and you think you've heard just about every ache, pain, dog-ate-my-homework reason in the book.  "Coach, sorry I can't play.  I have heart surgery." was a new one on me.  It rocks you back.  It rocked the whole team back.


The young man did go through heart surgery early that Thursday morning.  He came through with flying colors!  It was a cardiac catheterization procedure where the doctors went in through an artery in his leg and one through his arm.  They fixed the heart defect and he was back at practice (in street clothes) two days later; returned to practice two weeks later; and played in a final scrimmage game at the end of the year.  How that young man faced this situation, the attitude that he had throughout, the fact that he came back to play a game - simple game - was inspirational to the team. 


In my mind, there was no doubt that this young man was blessed to have an amazing team of doctors: the knowledge, training, experience. Hard to imagine how one comfortably shoulders that type of responsibility as a doctor when you enter that operating room.  He also had an amazing family supporting him and quality teammates to cheer him on. In a supporting role was the technology: the catheter, the cameras, monitors, plastics, wires, computers, and, yes, storage.  It struck me: congrats to all involved.  It matters.


Please take a moment to reflect this holiday season and visit: http://www.facebook.com/TechnologySavesLives

Sorry for the long blog absence but sometimes it’s good to step away and marinate a little on what’s going on in the industry; see what’s happening; do some reading and listening.  Personally, I never miss an opportunity to say nothing.  I’m not big on Twitter litter or flogging blogs, especially when there’s not much new going on.  Not a glittering jewel of social media, I know, but I think you’re in danger of becoming white noise if you feel you need to keep talking until you think of something to say.


SE1.jpgThe silver lining for me with the absence of big news: you see your fair share of speculation.  I think that’s a good thing.  Whether the writer is right or wrong doesn’t matter that much.  I just like the fact that they are thinking about you.  Most folks don’t know your roadmaps.  They’re just picking up fragments and trying to piece things together like technological dream interpreters.  However, with regards to most technology, I don’t want to disappoint, folks, but sometimes a cigar is just a cigar.


Over the past month or so, I’ve seen some speculation on what the next step will be in the Virtual Storage Tiering (VST) strategy from NetApp.  In my previous blog, Why VST Makes Sense, we took a look at how the ONTAP architecture lent itself very well to a real-time caching approach to data tiering vs. the intra-array ILM approach employed by traditional storage vendors.  The takeaway for the reader was VST effectively capitalized on the existing architectural strengths of ONTAP and was highly effective in improving performance while dropping CAPEX and OPEX for customers.  Customer adoption has been very high and, much like Unified Storage, you know it’s having an industry impact when your competitors adopt similar strategies.


This begs the question of what’s the next step for NetApp VST?  “They started with caching - will they adopt a traditional AST approach as their competitors?”


The short answer: no.  We think intra-array ILM will meet much the same fate as server-based ILM.  More importantly, we believe a caching approach – data tiering in real-time – is far more effective.  Our next step in VST will continue along this proven, successful caching path.


So, with that as the premise you can take a look around at the NetApp info-fragments and industry trends to assemble what I think is a pretty cohesive picture on where NetApp is headed here.


  • VST as it exists today works and works well across all storage provisioning protocols i.e. has effectively demonstrated value for customers in terms of improving performance and lowering costs in all deployment models.  A good use case (but not limited to): virtualized environments (SAN and NAS).


  • Flash is a game-changer on how we think about deploying storage.  NetApp sees the industry settling into basically two tiers: a capacity tier and a performance tier.  This gets back to an earlier point that customers do not want data that is “sort of” fast and “sort of” expensive.  Caching lends itself best to this two tier model and NetApp was way out in front on this.


  • SSDs – Several points to note on SSDs 


    • NetApp ships SSDs today but configuring SSDs as simply another pool of disk drives or block devices is a sub-optimal use of the technology.  (Our competitors change in strategy helps validate this thought).
    • Using flash SSDs as a block device necessarily requires management overhead or a reactionary, bolt-on policy engine of some sort.  Historically, policy engines and administrative intervention have proved ineffective.  Data priorities change faster than policy engines or administrators can react, particularly in large scale environments.
    • Customers will tend to see disk utilization rates (i.e. low) on these expensive devices – an ineffective use of an expensive asset.  Low customer adoption of flash SSDs as just another disk tier give testament
    • SSDs will provide a marginal benefit for sequential writes but do an outstanding job on random reads & writes.  Well-documented in best practices across the vendor board  in the industry.
    • Configuring SSDs as an extension of a caching approach has demonstrated value for customers.  Again, you could point to the high adoption rate by NetApp customers as well as by NetApp competitors as validation.


  • FlexCache - NetApp currently ships this data acceleration technology as either a separate hardware appliance or as a licensable ONTAP feature.  In sum, FlexCache automatically replicates and serves hot datasets anywhere in your infrastructure using local caching volumes.  Today, customers directly mount caching volumes (which can be made up of SSDs).  A caching volume “front-ends” origin volumes which can be made up of your low-cost capacity drives.  A practical demonstration of how a performance and a capacity tier can leverage caching as the data promotion/demotion engine.  FlexCache is a popular deployment model in high performance compute environments seen in such business verticals as Media & Entertainment, Oil & Gas, and Research & Development.


  • In the Fall, NetApp will ship ONTAP 8.1 which features a scale-out  architecture based on a cluster namespace.  ONTAP 8.1 will support FlexCache, Flash Cache, capacity (SAS, SATA) and performance (SSD) tiers, all of the storage efficiency capabilities (e.g. primary dedupe, compression, cloning (non-dupe), thin provisioning, RAID-DP) as well as the disk virtualization capabilities of WAFL such as aggregates/FlexVols and vServers (aka. vFilers).


SE2.jpgThose are the basic pieces and all public knowledge.  In the past, we have probably focused on one or several of these technologies.  As you guys know (be you competitor, customer or prospect) the competitive comparisons made at this atomic level can go back and forth until it all sounds like white noise.  But if you stand back and look at this, you can see a bigger picture emerge:   NetApp is moving the goalposts again on this whole Unified Storage model and caching will be a central component; a key thread that helps bind data management, protection, efficiency and acceleration together.


Competitors will be left with pieces of their portfolio to throw against ONTAP: “our scale-out solution can do that; or mid-tier SAN can address this part; our high-end SAN addresses this other part; our NAS (depending on which NAS solution you pick) can address some of this stuff over here.”  Just when you think you have the Unified Storage answer, the question changes (or did it?)


So, if you’re asking yourself what the NetApp answer is to traditional AST approaches, I think you’re asking the wrong question.  Caching is central to the NetApp strategy.  It's not an answer to anything.  It is the strategy.


With that as the starting point and given the public information available, you can think about how caching fills out the NetApp picture: caching within aggregates/FlexVols, caching within a controller, caching outside of a controller, caching within a namespace, caching extensible outside a namespace.  To the customer, they see their set of LUNs or a single mount point.  The caching happens all behind the scenes and, as caching does, it just works!  It works in real-time as customer demands dictate, as it should be.  The architecture adapts to the customer not the other way around.  Your technology should never have veto authority over a customer’s business decision.  It should support it; enable it.  The customer should never have to worry about changing their mind and having to change their architecture.


So, if you look at the big picture and the question is will the next step for VST be some sort of traditional AST-equivalent.  The answer is no.  Caching simply – technically and economically – works better than an AST approach.  If you want to speculate on what’s next for NetApp, start with caching and work out from there.


Why VST Makes Sense

Posted by mriley Apr 25, 2011

I just finished a couple of weeks in Australia and New Zealand.  We spoke with 66 different customers; presented at 2 different seminars.  All totaled, we had a chance to impact ~400 people. (Ton of fun.  Thanks ANZ team - NetApp & partners!)


Based on my informal poll, the most popular topic during these sessions was... storage tiering!  I don't think it's a huge surprise given the constant vendor drumbeat around it.  What I liked most was the way people were asking about it.  Gone were the questions on how our tiering strategy compared to someone else.  The vast majority of the questions asked how we approached the price/performance/capacity issue - how we solved the problem!


Let me tell you how I explained the NetApp Virtual Storage Tier (VST) approach to partners and customers, why I think it makes sense and why the majority of people we talked to appreciated the NetApp VST strategy.


The first point I like to reinforce for folks is technology doesn't drive technology adoption, economics do.  We can create a lot of great technology, but companies will adopt it only if it makes good financial sense for them.  You don't see companies running out to buy the latest storage array so they can brag to other companies in their sector that they were the first to own the iDASD.  (I figure you put lower-case i in front of things you can charge more).


The second thing that usually everyone agrees with is the buzz around storage tiering really heated up with the introduction of flash: super fast but super expensive.  Still, there had to be a good way to take advantage of it.  At the time of introduction you basically had some easily identifiable tiers:



You had SATA on the low-end and flash at the high-end with SAS and FC ready to fight it out for that Tier 2 space.  The bottom line: you could discern some distinct tiers of storage.  All you needed was a way to stitch all of the together - what was the best way to move data between the tiers so it was at the right place at the right time at the lowest possible cost?  If you look at 3 to 4 distinct tiers of storage, then it probably makes sense to explore a policy engine to move data between them so that you're not buying too much of one and not enough of the other.  This ILM line of thinking was familiar to a lot of folks, especially storage vendors with a raft of different storage platforms.


Unfortunately (or fortunately - depends on your perspective), this new brand of intra-array ILM looks like it will meet the same fate as other brands of ILM.  The $/GB is falling faster than people can examine, deploy, create and modify data movement policies.  With costs falling, hard drive prices and capacities are gravitating towards the lower left-hand quadrant.




I'm not sure what the price elasticity is for spinning media but it's  quickly getting to the point where the cost of policy engines is  exceeding the cost savings from managing one of the fastest depreciating  assets we can put on the data center floor.  What we're seeing is the cost of storage settling into two distinct cost tiers: spinning media and flash.  Flash still costs about 18X - 20X more than spinning disk at this point.


I think this cost dynamic makes sense.  When I talk to customers they pretty much tell me that when they want their data, they want it as fast as humanly possible.  When they don't want their data, they want it stored as economically as possible.  Nobody tells me they want their data sort of fast and sort of inexpensive.  Customer demand (and use) has dictated this commoditization of disks.


Still, customer demand doesn't influence the physics of the drives themselves.  Relatively speaking, Flash delivers a ton of IOPs for the dollar.



You can still see some distinct tiers of storage here, particularly if you focus on random read/write workloads.  Again, it might make sense from a cost/IOP perspective to invest in an intra-array ILM strategy.  However, if you were to look at this from a sequental workload perspective, there's very little difference between Flash, SATA, SAS and FC.



Now, I have to say that with the introduction of Flash NetApp did consider an automated storage tiering engine but like any consumer we had to take a look at whether that approach made sense to us.  It didn't but let me explain why.  One of the design principles of Data ONTAP (WAFL) was to sequentialize all writes that come into the system.  We used ONTAP technology to basically homogenize all drives (flash and spinning) for random write workloads and reduce latency in the process.


We have talked about the ONTAP effect on random workloads in a previous blog post, Flash Cache Doesn't Cache Writes - Why?  This sequentialization of all writes also explains why NetApp RAID-DP is a practical (ie. doesn't impact performance) implementation for random write workloads.

Between the ONTAP effect on random write workloads and the price commoditization of spinning disks, we're left with essentially two tiers of storage and one final technical requirement: to further accelerate reads.  In that context, what made sense to NetApp was an intelligent caching strategy.  We didn't need a policy engine to make data hop from tier to tier to tier.  Customers wanted two (low-cost capacity and performance) and that's what we see the market today.  With only two tiers, a caching strategy made much more sense than strapping on a policy engine.


Caching gave us a time-tested approach and several advantages as we see it:


  • Data promotion in real-time vs. a reactionary model
  • A high degree of granularity (4K block movement) and therefore, a high degree of efficiency
  • 100% utilization of an expensive asset vs. disk utilization rates of an expensive asset
  • Easy to deploy and self-managing - caching just works and responds in real-time to data demands vs. pre-configured policies
  • No integration effort - since caching just works, it works for all NetApp features and across all protocols without caveats
  • Transparent block sharing - dedupe aware in its implementation, 1TB of cache can logically handle multiple TB of requests and lends itself very well to VDI and server virtualization.


I personally think that this confluence of factors - a hugely effective and efficient caching strategy for one of the hottest application areas to hit the planet in a long time - is the reason that NetApp has been able to grab marketshare during recessionary times.  I'm not kicking dirt on other AST solutions but I do believe that NetApp effectively capitalized on existing architectural strengths in the NetApp product and did it at a time when the market was coming our way with virtualization.

A quick post to call your attention to some operational efficiency issues.  First, please head over to Vaughn Stewart's blog (The Virtual Storage Guy) for a fresh discussion on LUN misalignment and its impact on performance in virtualized environments.  I'd also highly recommend Keith Aasen's post on The 4 Most Common Misconfigurations with NetApp Deduplication. Both articles make the point that you can have some really great technology at your fingertips, but if deployed even "almost" right, it can have a geometric impact on an environment, especially as it scales.  In particular, if your LUNs (or files) are misaligned on block boundaries, you can see how it can make even the highest performing array rock like an old washing machine on 3 legs and impact the effectiveness of deduplication.  (Your mileage may vary.)  Please take a moment out of your day to read these articles and their associated comments.  Valuable information there.


Another area of operational efficiency has to do with troubleshooting.  I know vendors don't like to talk about it but it's a fact of life.  I don't think it's a question of whether or not you'll have an issue but it's how you respond that matters.  In order to respond properly, you have to really think hard about how to do mundane things more efficiently - like transfering core files from large memory systems.  What used to take hours can take days unless you streamline that process.


All I'm saying here is it's more than just flipping the switch on some features.  You have to take a look at the entire stack, decipher an efficient IT strategy and figure out how to support it effectively.  One more plug: take a look at our Efficient IT page to see some of this thinking put in practice.  I think we're on the right track here.  I'm touring around Australia and New Zealand.  I've had a chance to talk to over 50 customers and, having been at NetApp for a while, I can tell you these conversations have changed dramatically.  We have moved from the "Prove it - I don't necessarily believe what your saying" to "tell me how to make the best use of our systems."  So - less evangelization; more practical application. Vaughn (NetApp veteran of 11 years) was down here this past week.  We would compare notes over an adult beverage or three (at the end of the day, of course) and we were both struck by how times and conversations have changed for us.


Look, we're not a finished product yet but I think we're aligning well with the thinking I'm seeing from our customers and I'm excited about it.

Efficiency has always been the driver.  I started to reflect on this, retracing the history of NetApp from startup to clearly one of the top storage companies in the industry today.  Next thing you know the blog is 23 pages long!  No thanks.  Net-net: the reason people have bought NetApp has always been due to lower cost due to some efficiency innovation.  We never had the motto "If you want the Cadillac you have to pay for the Cadillac."  (When I was a customer, I actually had a vendor (guess which one) say that to me.)


Tom Georgens has made it a point to remind us that economics - not technology - drive technology adoption.  If you take a look back at all of the innovations or ideas NetApp has championed through the years (even prior to Tom joining NetApp), the primary driving factor has been economics.  I'd suggest that if you asked Peter Corbett the driving premise behind RAID-DP, after a sentence or two on drive reliability, Peter would sum up by talking about the cost implications of NOT doing RAID-DP.


Recently, many of the answers on how to streamline IT have ended up generating an enormous amount of data. As it turns out, NetApp has focused on the right solutions and the market breaks big in our favor.  We're in the right place at theright time with primary storage dedupe and non-dupe (aka. cloning) technologies.  This resonated huge with customers.  It has given NetApp a distinct advantage over the competition particularly in virtual infrastructure deployments.


Now, we're still going to have a concerted effort around infrastructure efficiency.  In fact, I invite every NetApp sales team and NetApp partner reading this to make sure they use tools such as My Autosupport and the OnCommand software suite to help existing customers understand opportunities for wringing further costs from the infrastructure - server, network and storage.   Take the time to understand how "fit" your IT really is.  We have a new landing page for you to check out.





The next big thing - just in case you want to know what EMC will pitch a few years from now - will be operational efficiency.  It won't be a specific feature or software package.  It will be people.  Look, let's say you dedupe, non-dupe, thin provisiona and compress your way to 50% storage savings.  That's great but a half of a petabyte is still a lot of storage and logically, it's still supporting a petabyte worth of applications and user data.  Much of the infrastructure (servers, storage and network) will be virtualized -  a completely new layer of misdirection to plumb through.  For the IT professional, life keeps getting more interesting but it isn't getting any easier.


Regardless of how technology simplifies an environment, the businesses continues to drive the technology past existing limits.  What is needed and what we have found extremely successful is bringing in individuals who can step back from the technology and look at the business first.  Think of these folks as Technical Directors or field CTOs.  A military analogy might be SEAL team or SAS.  We call them Solution Architects.  Let's be up front with our treachery here, NetApp will probably be part of the solution they propose... but not necessarily.  That's why it translates well to partners.  The goal is to bring in someone with broad industry experience; someone that understands what has worked well elsewhere in that business vertical; assess the current customer environment and recommend a long-term solution.  That's also why you're seeing NetApp expand it's portfolio into heterogeneous products (e.g. OnCommand).


Storage efficiency; unified storage - all great tools.  We're moving on to operational efficiency and investing in the teams to deliver it.  If you're a customer reading this, I'd ask my sales team to get one of these Solution Architects in front of you ASAP.


Dear NetApp Partner

Posted by mriley Mar 10, 2011

I got a number of e-mails referencing my last post (Dear EMC Customer).  A lot of positive comments but more than a few requests to go ahead and respond to the bluster over at Hollis' blog.  As I mentioned before, I usually don't spend a lot of time in front of customers talking about EMC.  If EMC wants to spend a hand-wringing hour in front of a customer obsessing on NeApp, fantastic.


But, the realist in me does get it.  The Hollis quotes will be plopped down and you'll be asked for a response.  Fair enough.  As Steve Duplessie tweeted recently all is fair in love and selling disk boxes.


If you're used to the predictable "student body right" playbook of EMC then you probably feel pretty comfortable asking a pointed question or two; pointing to an obvious contradiction or hyperbole and watching the whole thing implode.  I remember when a customer called me late one night to let me know he was awarding NetApp the business.  I asked him what tipped it in our favor.  I was expecting him to name a feature, tool, critical partnership - something that we had brilliantly presented and positioned :-).  Nope.  He said that it was technically very close.  (I have to admit I was a little disappointed.)  He explained that after EMC handed him a FUD document, he asked a few pointed questions at which point the EMC rep paused and said,  "You know, you're a real pain in the ass."  Whoa! That was pretty much all the tip in that the customer needed.  He hung up the phone with EMC and called me.  I was glad he called.  I learned I had to sharpen my technical axe.  I also learned that just a few fact-based questions may cause the EMC team to set themselves on fire.


Now, I'm not saying that's indicative of the entire EMC sales force.  The point is one way or another, FUD can kill your deal and most customers see through marketing spin 5 seconds into a meeting.  It's just not worth it.  If NetApp can help you, great.  If we can't, we should move on and see if there's someone we can help.  It's a pretty big market out there.


That said here's a few ideas if you see these topics pop up at a customer site:


EMC's Unified Storage Products Are Guaranteed More Efficient

It’s based on out-of-box defaults which is why I think EMC feels comfortable with a 25% guarantee and let me tell you why I think that is.  Our systems ship with snapshots enabled because there’s no performance hit to use them and people universally find them valuable.  EMC’s systems ship with snapshots disabled because there’s a demonstrable performance tax associated with them (note: none of their benchmarks run with snapshots enabled).  We set aside a 20% snap reserve for each volume and a 5% snap reserve at the aggregate level.  20% + 5% = 25%.  If you ask a customer if they value snapshots (and most just assume they’re included/enabled), then about 3 seconds after you take the systems out of the box, the EMC guarantee evaporates.  We haven’t even talked about our other storage efficiency advantages over EMC or the extra gear EMC would need to include to counter-balance the performance overhead of their snaps.  I think we should change our snapshot reserve to 17.93% and see if EMC offers a 17.93% guarantee.


EMC’s Unified Storage Products are faster

Only if you allow them to use four machines to your one.  All EMC did in their recent SPEC benchmarks was test 4 systems and then add up the results.  It was ridiculous and I think they've taken on some water because of it.  We can take any one of our benchmarks and do the same thing.  Take one of our SPEC benchmarks and simply multiply the results by 4 and, presto, we have a much faster “benchmark.”  Dimitris Krekoukias did a great job of dissecting this in detail here and here.


EMC's Unified Storage Products Integrate Better With VMware

Simply just a disengenuous (at best) statement that no one is close to EMC here.  I believe NetApp is widely regarded as having extremely strong integration with VMware.  EMC themselves understand the leadership role NetApp has taken here and that's what has them freaking out over in Hopkinton: we're playing in their backyard - and winning.  If I had to pick a single reason why EMC has us #1 on their hit parade, it would be the demonstrated value of NetApp in virtualized infrastructures.  Heh, if you can't beat 'em, shove 26 girls in a mini-cooper and go stencil their sidewalks, I guess.  (I really wish I could be a fly on the wall in EMC Marketing meetings when these pearls are brought up and everyone harumphs).


As far as VAAI, we have it and I think the products we have built on it have the edge on EMC.  I'll shoot you over to Vaughn Stewart's blog.  He's the Virtual Storage Guy for good reason! 


As far as 75 different integration points, you can look at it this way: when you have 75 different platforms and tools, you need 75 different integration points.  Putting the Engenio announcement aside for the moment, when you have one OS, you only need one integration point.  If you’re truly a unified storage platform then you need far fewer integration points.  Hollis is simply making the it's-not-a-bug-it's-a-feature argument.

As an aside, if you want a little more info on our Engenio plans I would direct you to Dave Hitz's recent blog on the topic.  There's also a thought on running ONTAP as a VM (our virtual storage appliance or VSA).  Interesting.


EMC's Unified Storage Products Are Way Simpler

I think EMC does have a slight manageability advantage as far as their GUI tools go.  They have done an excellent job.  We have some excellent tools as well but they are still somewhat fragmented.  Is EMC “way simpler.” That pushes the bounds of believability.  If we’re honest with ourselves we would say that they have a slight advantage with their manageability tools and they are marketing it very well.  The counter-balance is they still have a multitude of disparate products out there so the fragmented statement could apply as well but I think NetApp does have some work to do here.  No doubt.


EMC Has Great Customer Support

They do but so does NetApp.   In last year’s Storage Magazine Quality awards, NetApp finished 1st, EMC 2nd.  When it came to technical support, IBM finished 1st, EMC 2nd and NetApp 3rd.  The difference between NetApp and EMC was 5 hundreths of a point – 6.40 vs. 6.35.  Arguably, a rounding error.  You can find the report here.


EMC Does More Than A Single Storage Product

Correction: EMC has more than a single storage product and apparently, given the Engenio anouncement, that’s not such a bad thing, is it.  You have to get up pretty early in the afternoon to get one by Chuck.


EMC Has Great Partners

I'd gladly put NetApp's strategy, relationship and partner enablement program up against EMC's.  Simply put, we're not in competition with our channel.  We're not looking to take things direct.  And, by the looks of things, the partner community appreciates it.


NetApp Awarded 2010 Everything Channel Partner Program Guide and Five-Star Partner Program Rating




NetApp’s Tom Georgens and Julie Parrish Named 2 of the Top 100 Most Influential Executives in the Industry by Everything Channel’s CRN




Julie Parrish and Todd Palmer of NetApp Recognized as Channel Chiefs by Everything Channel's CRN




I think that about does it.  The key thing for me is answer direct questions with direct answers.  I'm not smart enough to keep track of all the FUD and I think it's a giant waste of time for everyone involved.  A few pointed questions is usually all it takes and then you can move on to whether or not you can help the customer.  The quicker you can put the focus back on the customer, the better off you're going to be.


Dear EMC Customer

Posted by mriley Mar 9, 2011

Here's hoping that the first thing on your mind this morning is the first thing that is on EMC's every morning.


If you're wrestling with any business problems, we'd love the opportunity to see if we can help.

All the best this morning and all the others to follow.  Make it a great day.





I Wish I Had Said This...

Posted by mriley Feb 24, 2011

Today, I’m going to take the gloves off and  look at what I consider a particularly egregious example of  benchmarketing in the storage industry.

Not only does it not do what it says on the label, you could end up far worse than you started off.  You can arrive at your own conclusions as to whether or not I’m overreacting.

Generally speaking, EMC tries to draw the line at avoiding benchmarks that don’t represent realistic use cases. (But) I saw stuff there that no sane customer would ever put into  production.  So, if no one would actually buy a configuration like that,  why would you test it and publish the results?

The answer to this, sorry to say, is when vendors confuse value-added benchmarking with benchmarketing. In the former, you pick a customer use case and say "here's what you're likely to see in your environment". In the latter, enormous effort is spent to come up with a better number, and then bash all the competitors with it ruthlessly.

It doesn't matter if the number is relevant or not.  Just that Vendor A beat Vendor B.

The real question is whether the problem is solvable, or not.  I will  offer the opinion that the problem is getting worse, and not better.


But I didn't.  Still, it has a nice beat and you can dance to it.


Yeah, I saw the $6M benchmark from EMC and I was all prepared to roll up my sleeves and start some technical dissection, but then I just started looking at some of the internal messages floating around NetApp.  They really didn't focus on the results.  The e-mails were more...entertainment than anything else.  My favorite was from one of my NY buddies that equated this near record-breaking performance by EMC to the equivalent of jamming 26 gymnasts into 10 Minicoopers and then trying to get the whole lot through a toll booth.  Both efforts - benchmark and Minicooper stunt - would pretty much accomplish the same goal, though: an EMC headline.


I was actually in the air when the results were published.  When I landed I saw the headline in The Register. I figured it would deserve a comment but, quite frankly, I had a hard time getting fired up about it.  It's kind of like getting fired up about professional wrestling.  Being from Chicago, I can get fired up about the Bears, Bulls, Hawks or Cubs (sorry - not the Sox) but not about Wrestlemania MCMXVIII.


In this blog, we try to relate things back to Efficient IT strategies but, that would require some engineering analysis of the benchmark itself.  Just like I didn't believe the original announcement on the VNX would be about performance, I don't believe the SPEC number from EMC is about performance.


So, if no one would actually buy a configuration like that,  why would you test it and publish the results?


Simple - to grab a headline; get a sales call.  I get it.  It's certainly not about changing the economics of IT (at least not favorably) so your budget goes further.  That's O.K.  Maybe NetApp will shoot a filer across the Snake River canyon.  Who knows?  At the end of the day, it's EMC's dime so they can set all the inconsequential records they want.  I'd actually like to see a VNX get dropped off the roof on the Letterman show.  That would be cool, too.


by Mike Riley, Director of Strategy & Technology, Americas


In doing some tweeting, e-mailing - God forbid we pick up a phone - some colleagues of mine started talking about how we best position our Flash Cache technology especially in the face of all of the automated storage tiering offerings out there.  All too often we - and I mean customers and vendors alike - get caught up in the "Well, ask them why their product doesn't do this or doesn't do that?!" We go back and forth trading barbs and FUD and if something seems to stick, heck, we just ride it regardless of whether we know or even suspect that it's not true.  Really sad.  All of this BS has come roaring back to life with the introduction of solid state.


I don't think anyone is disputing the performance potentional of solid state but it also raised the question for customers on how to best deliver on price as well as performance.  Per usual, the storage industry's age-old answer to this age-old question: storage tiering (as if we don't already know how this movie ends).  As you would hopefully come to expect, different vendors have different approaches (some vendors have several).  EMC, Compellent, 3Par and others have a handful of different Automated Storage Tiering (AST) solutions.  NetApp has marketed a Virtual Storage Tiering (VST) strategy.  Both have merit because both rely on the historical strengths of the companies that back them.  AST is an extension of the ILM messaging that companies such as EMC promoted in the past.  VST builds on NetApp's fellowship ring, WAFL.


Where this falls apart for a customer is when we ask Vendor A to compare it's technology/solution to Vendor B.  Wrong question.  (Sorry - the customer is not always right.  Heresy!)  The right question is, "Tell me how you solve my problem."  I suggest that it's O.K. to step back and answer that question for the customer vs. getting into some technology feud with another vendor.


To best understand how and why NetApp VST addresses the price/performance question for customers, it's good to know a little history on how NetApp arrays work.  Why?  Well, I figure if you know what questions NetApp has already answered with ONTAP, it lays out for you the next logical step NetApp would take with their technology.  One question (or accusation depending on who is doing the speaking) that comes up quite a bit is "Heh, Flash Cache doesn't cache writes!"  I think it's actually a great lead-in for VST.


John Fullbright is one of our top Professional Services Architects and does an amazing amount of personal research in his "spare" time.  As we were having tihs conversation around VST, John started to talk about some personal testing he was doing to characterize WAFL write acceleration.  I asked if John wouldn't mind writing up his findings.  John was gracious enough to share the results and I think it's a great empirical way to ground VST/AST discussions.  I'll turn the rest of this blog over to John.


Uniquely ONTAP

by John Fullbright, Public Cloud Architect, MSBU

NetApp’s Data ONTAP does something uniquely different from the majority of other storage vendors’ products; it’s optimized for writes.  Indeed, write optimization was one of the original design criteria for Data ONTAP back in 1992.  Dave Hitz himself explained this many years ago in TR-3001 (since updated).  In brief, Data ONTAP eliminates the “Disk Parity Bottleneck” through its use of WAFL to coalesce a group of temporally located write IOs; pre-emptively “defrag” if you will this group of I/Os based upon the best possible allocation unit or “tetris” available; calculate parity for the entire lot while in memory, and stripe the lot of them across all available drives during the next write event (aka consistency point, CP).


Several properties of WAFL make NetApp stand out from legacy arrays – “unified” or otherwise: WAFL is “RAID-aware” which allows engineers to introduce features like RAID-DP with zero performance impact.  Snapshots are simply a preserved, specially named CP and part of the WAFL DNA.  That’s why NetApp performance with or without Snapshots is the same and left on by default vs. after-market bolt-on snapshots that we see  turned off by default and rarely, if ever, featured in 3rd party benchmarks.  WAFL does a great job of mitigating the largest source of latency in any array: disk access.  Essentially, Hitz and Lau conceived of a system that solved the megabytes-per-second-to-miles-per-hour equation more efficiently than anyone did before or has since.  For more details, I recommend reading John Martin’s blog where he describes WAFL as the ultimate write accelerator.  Kostadis Roussos, in his WAFL series on Extensible NetApp, provides great detail about how the process works as well.


Still, there are those who fail to get it.  In this world of hypercompetitive storage vendors, it’s common to hear things like “WAFL degrades over time” or “Flash Cache doesn’t cache writes”.  I like to think that these statements are made out of misunderstanding, not malice.  To help demonstrate what WAFL means for write performance, I decided to run some tests using 100% random write workloads on my own FAS3050.


The Test Platform:

  • FAS3050
  • ONTAP 7.3.4
  • (2) DS14MK2-AT drive shelves – dual looped
  • (28) 320GB Maxtor MaxLine II 5400RPM SATA drives                  
    • Storage for Checksums ~8%
    • Right-sized (for hot-swap) = ~2% across the industry
    • WAFL Formatted = 26.8 GB per drive (reserved for storage virtualization)
    • 11ms average seek time
    • 2MB buffer
    • Internal data rate 44MBps
    • Drive transfer rate of 133 MBps
  • Storage Efficiency/Virtualization Technologies Employed:                  
    • Volume-level Deduplication
    • Thin Provisioning
    • RAID-DP
  • RAID-DP RG size = 24
  • Tuning options:                  
    • optimize_write_once = off
    • read_reallocate = on
    • options raid.raid_dp.raidsize.override on (Updated: 2/14/2011.  See comments below.)


As you can see, a FAS3050 running two shelves of 5400 RPM MaxLine II drives each with a whopping 2MB buffer doesn’t exactly qualify as “state of the art” but it will do and actually help prove a point in the following test.

Like any customer, I wanted to make sure I could squeeze as much usable space out of my configuration as possible.  Starting with a total of 28 drives I used:


  • 3 of the drives for the aggregate containing the root volume
  • 1 one for a hot spare
  • 24 disks in a single RAID-DP raid group for my test aggregate. 


This filer is not part of a Metrocluster.  I’m not using Synchronous SnapMirror.  This allowed me to remove the 5% aggregate reserve to gain that space back.  This leaves me with 5.18 TB (5.18 TB = 5304 GB = 22 drives * 241.2 GB) usable in the test aggregate.  This filer also supports my home test environment, so I do have 26 Hyper-V VMs stored on it as well as a CIFS share that I put my ISO images on.  I thin provision and de-duplicate all of my volumes and LUNs, so the 4.1 TB I have presented to my Hyper-V servers actually only consumes 52 GB from my test aggregate.  I use iSCSI to connect my Hyper-V VMs to the storage.


The Test:

For this test I created three volumes, each containing a 1.5 TB LUN connected via iSCSI to the VM running Iometer.  I created a 100% random, 100% 4K write workload for this test. In the Iometer Access Specification, I also ensure that the IOs are aligned on a 4K boundary.  Iometer creates a single test file on each LUN that it writes to.  I created a separate worker for each LUN, for a total of three workers.  The goal was to fill the filer up to 90% or so and then see how well both write allocation and segment cleansing work under these conditions.  The test duration was 10 days in order to ensure that I would overwrite the data several times.



Access Specification used for test

The Results:

From the beginning of the test, I was hitting in excess of 10K IOPS with ~4 ms latency.



IOPS Achieved ~15 hours into the test

Into the third day, by this time with the entire test data set overwritten at least twice, the IOPS had increased to 12063 IOPS, and the latency was slightly less than 4 ms.



IOPS Achieved ~60 hours into the test

By the time the test was complete, we had written over 36 TB to the 4.5 TB of LUNs.  That 4.5 TB of LUNs for the test, combined with the 52GB of other data in my test aggregate filled about 88% for the useable space.  Over the 10 days the test ran, the achieved IOPS increased roughly 18% while the latency remained essentially flat at 4 ms.



OPS/Time 240 hour run

So, what’s the point of a test based on 100% random writes?  This represents only a small fraction of workloads, right?  The test is an example of how, by transforming random writes into temporally grouped sequential writes, Data ONTAP is truly optimized for write performance.  With writes already optimized, if you’re in NetApp’s shoes, the next logical question to ask yourself is how would you optimize for reads?  I think Kostadis said it well here:


So why do we need a lot of IOPS? Not to service the writes, because those are being written sequentially, but to service the read operations. The more read operations the more IOPS. In fact if you are doing truly sequential write operations, then you don’t need that many IOPS …


NetApp’s solution:  reduce read latency by reducing the number of reads that go to disk with De-dupe aware Flash Cache.  Why doesn’t Flash Cache cache writes?  The write latency issue has already been solved.  It makes no sense to cache writes if it only serves to put a bump in the wire.  For write caching in Flash to make sense, it would either need to reduce latency at the same level of load or maintain the current latency at a higher level of load.  In a highly write optimized environment, it does neither.  In fact, it doesn’t appear to do much except put a bump in the wire for traditional arrays.


This was a test run on two generation old hardware to examine the impact of 100% random write workloads on the write allocation and segment cleansing processes in Data ONTAP.  Although the ONTAP version is fairly new, it’s certainly not the latest.  NetApp has been refining these processes for nearly 20 years.  Many trends are working in NetApp’s favor:


  1. With each new ONTAP release, the algorithms improve.
  2. With each new iteration of hardware, there are faster CPUs, more cores, more RAM, faster hard drives, and so on.  That’s more time and resources to make those improved algorithms work even better.
  3. As caching becomes ubiquitous at all levels of the stack, from the application to host to the network to the storage array, this tends to “wring the reads” out of the IO workload.  Workloads increasingly have a higher percentage of writes – a demonstrated strength for WAFL.
  4. NetApp writes data in a manner which preserves temporal locality.  This tends to make the job of the read caching algorithm a bit easier.


Going forward, it will be interesting to see what the result is when I start mixing reads into the workload and examining the impact of read cache.  That, however, will be a future blog.


Akorri & ITIL

Posted by mriley Feb 2, 2011

The good doctor Larry "Dr. Dedupe" Freeman wrote a great blog reviewing Akorri and what it means to you.  Great piece.


On a slightly different tact, what's in this Akorri acquisition for NetApp?  What do we get out of it?  As a for profit organization (or organisation if your from the UK or ANZ), I think it's a fair question to ask.


A blog or two ago, I mentioned that 2011 will be all about vendor execution.  NetApp had some work to do on their management tools and, there has been a lot of progress there.  In fact, I would highly recommend taking a look at some of the YouTube demos featuring our OnCommand tools.  In particular, I like what John Hanna put together on Service Level Management with NetApp's Service Catalog which features our Provisioning Manager tool.  Most of these tools have to do with monitoring, reporting, provisioning, protecting, and app integration.  What I like about John's video is the extension and application of one of our tools as part of a service offering for customers.  It gives you a glimpse into the self-service aspect of our execution strategy.


Another indicator on our strategy: It's no secret that NetApp has targeted telcos and service providers with a dedicated and focused team of NetApp sales, engineering, support and marketing.  Telcos and service providers would be/are a major force in Cloud market so the vertical team made sense.  Upon founding this market vertical, the team immediately began to build out their "design, build, win" strategy based in a large part on ITIL concepts and practices. You're also hearing ITIL mentioned frequently by customers from all market verticals so you really have to build out your solutions to help customers meet their ITIL-based processes.  You're strategy has to be bolted to a solid foundation and ITIL is a practical place to start, especially as we see customers doing the same thing.


Having some well-grounded blueprints and structured service offerings for your tools, though, is only part of the answer.  You can't just plunk down some ESX servers, Nexus switches, and NetApp storage or a Flexpod and call it a day (or a Cloud).  The whole transformation process to a dynamic data center has to have some well thought-out processes and expected outcomes.  As you apply an ITIL process to your environment, as a vendor you have to be able to look outside of your little storage sphere.  You need to take a look at the entire stack and Akorri gives us that. If you want to build a dynamic data center, then you better be confident that you know what needs to move and what needs to stay put.  This feedback loop is critical in managing your dynamic data center.



I once heard Cloud described as the pluralization of virtualization and I really liked that description.  It's a very practical breakdown of what we're trying to implement and manage here.  Take a listen to Jason Cornell, Windows Systems Manager, from Autotrader describe how Akorri is helping him manage the entire stack and line him up for a successful ITIL implementation.



So, what's in it for NetApp?  Execution. Simply put: we're arriving on site with focused teams, carrying detailed blueprints and ideas and the tools to execute on those plans.  We're not thrashing around with answers in search of questions here.  At the Stanford School of Business, the question is often asked would you rather have a poor plan and great exectution or a great plan and mediocre execution.  It's surprising how many people pick "great plan."  You can always go back and adjust the plan.  Akorri gives NetApp and customers a way to quickly adjust the plan.  If I am a competitor, I'd be afraid of a focused competitor with a fundamentally solid strategy and the ability to execute... be very afraid.


Does FAST Make a Difference?

Posted by mriley Jan 17, 2011

So, EMC's big announcement is today.  I'm not glossing over it.  I'll tune in but honestly I think there will be enough blogs out there today talking about lipstick, pigs and Frankenstorage - and a bunch from EMC now proclaiming that not only is unified storage a good idea but they have perfected it in their first try!  I hope no one pulls a muscle slapping themselves on the back.  Still, only EMC can come up with an announcement on unified storage that includes two separate product families (VNXe and VNX), two distinct versions of Unisphere, several replication products, a handful of different snapshot technologies, and enough appliances and blades to make the people at GE envious.  The whole announcement is an oxymoron.  Somewhere Moshe Yanai has to be screaming into a pillow.  On the positive side, no one made any career wagers last year on whether or not unified storage was a good idea.


You think the desks over at Hopkinton come equipped with seat belts and airbags?  They would almost have to be given the sudden stops and reversals over there.  It reminds me of those Allstate commercials featuring Mayhem:


"Re-calculating!" The EMC roadmap battlecry!


Between the EMC leaks, late night blogs and given their technology landscape, I pretty much said my peace on what I think the announcement is.  All today's announcement will need is the Slapchop guy.


Buried underneath all the hype around the Unified Bezel from EMC is the disappearance of FC and SATA drives which leads me to today's topic.  If you only have two tiers of storage (NL) SAS and Flash, what do you need (what will now be known as) FAST VP for?  I'm not disparaging the customer price/performance goals.  I recognize the problem.  I'm looking at the rationale for the solutions out there.  Right now the clear leader in market adoption appears to be VST from NetApp with over 3 PB of Flash Cache shipped and competitors like EMC following the lead with FAST Cache.


You see, to me, caching makes sense.  Yeah - I work for NetApp, home of the VST strategy - but if you start running out of tiers, why do you need a tiering engine like FAST?  At some point, doesn't AST just evolve into VST?  I mean, you're only dealing with two tiers!  It seems like you would pin the high performance stuff to flash if you could and let customer/user demand dictate the ebb and flow of data between solid state and spinning media.  I don't want to rehash old arguments but the use of exclusively SAS drives and flash brought the question back up in my mind.  Love to hear your thoughts.


Other than that, I didn't see anything of note in the Unified Bezel announcement - no new advancements and not a lot of sweat equity went into it at all.  Heck, if EMC can't put a lot of thought behind the "e" in VNXe, I'm not sure the whole attention to detail thing was really present during this latest "Just Like NetApp" attempt.


Seriously, you have to check this "Just Like NetApp" message delivered to NetApp partners and field technical team.  Great job by Matt Watts!



Filter Blog

By date:
By tag: