Recently, I recorded a Brainshark session to demystify and clarify deduplication. This is a free 25 minute web-based presentation in which I walk through an example of how dedupe ratios really work, and illustrate what it takes to achieve the high ratios that are often touted for dedupe technology (e.g. 20:1). I also discuss how deduplication technology can be used with conventional compression techniques, and provide a brief overview on the specifics of our data deduplication technology for the NetApp VTL (Virtual Tape Library) product. If you have any questions or comments about the material, please reply to this post.
*Here is the presentation link: http://learningcenter.netapp.com/LC?ObjectType=WBT&ObjectID=00202061
* You will need a NOW account to access the presentation. If you do not have one, you can sign up for one here.
Director, Technology & Strategy
Data Protection & Retention Business Unit
HI Kevin... I forwarded your message to the site adiministrators. Again my apologies for the hickup. The message from Chodi saying this issue was fixed was only posted a short while ago, so I'm wondering if there isn't some sort of time lag for whatever she did to propogate fully through the site. I'm not sure though.
I enjoyed the presentation.
I was thinking towards the end of the presentation it might be a power message to show how many tapes have been eliminated by the VTL solution. Of course some of the backup operators watching it might realize they could soon be out of a job, so I guess you might leave it out depending upon the audience.
Does anyone know if we can Snapmirror to tape via VTL and still preserve deduplication?
Hi Jason... You're absolutely right about the NetApp VTL not being able to run SnapMirror, but it can participate in the thing we call "SnapMirror to Tape" (SM2T for short). SM2T was originally designed as a way to write WAFL volumes (originally our traditional volumes, these days our FlexVols) out to physical tape such that they could be shipped SneakerNet-style to a remote destination for the purpose of seeding the baseline transfer of a SnapMirror relationship without having to do the baseline over the wire. It was basically to help out with situations where the data was large and the wire was small, but you needed to get an "incremental forever" SnapMirror relationship up and running nonetheless.
Since SM2T was first introduced, roughly 8 or 9 years ago now, it has also been taken up by some sites as a way to "bulk backup" volumes to tape in situations where restore granularity was considered less important than raw backup performance (Granular restoration of individual files/directories is not possible from the SM2T format. It is really meant as a "whole volume" type of a thing). The most recent development in the area of SM2T is that IBM's Tivoli Storage Manager (TSM) backup application is now able to perform backups/restores using the SM2T mechanism, albeit with the same limitations in the area of restore granularity. It is in this context that the VTL can play, simply by acting as the "tape" target of an SM2T.
Hi James.... The reason VTL's can sometimes (but not always) cut down on physical tape usage actually don't usually have much to do with dedupe. I don't know of any VTL's that are able to create physical tape directly that actually write deduped data to tape, including our own. In our case, physical tape is always created in a "native format", meaning the data written to the ptape is always identical to the original data that was written into the corresponding virtual tape by the backup application. This ensures the ptape can be restored directly if necessary, without the need for the VTL to be in the picture.
Your SnapMirror-to-Tape question is an interesting one. If a deduped FlexVol is SnapMirrored to a virtual tape within the VTL, it will indeed be stored in a deduped form within that virtual tape. However, it would not have been the VTL's dedupe implementation that did the "dedupe-ing"! :-) What's happening is that SnapMirror-to-tape would send the already-deduped structure of the deduped FlexVol out to tape. In fact if you SnapMirror a deduped FlexVol to physical tape, it would be stored in its already-deduped form on physical tape too, so the VTL really can't take any credit for doing anything "clever" to make this work. It just works from the source of the data, to wit the dedupe FlexVol itself. A rough analogy here would be if you were to write a ZIP file onto a physical tape, the compressed zip file would be there on the tape, but it wasn't the tape drive that compressed it.
The primary manner in which VTLs can cut down on physical tape usage lies in the context of the backups that can live on the VTL for the entire duration of their life cycle, from their initial creation to their expiration. Because such backups never darken the doorsteps of a real physical tape drive, no tapes for them are required.
That's what I was looking for. Snapmirror to tape has the benefit of offloading the VTL of the deduplication work. Then it will just need to do the hardware compression.
If customer's do Snapmirror to VTL Tape, then use their backup software to clone it to physical tape then I hope the dedup savings could be passed on to the tape media???
Granualarity of restore is an important factor for data outside of the retention of Snapshots. I had not considered that, so that was very helpful info.
I have a customer project I'm working on where we are removing tape completely from their environment using NetApp Snapshots, Snapmirror, and Snapvault, SME, SMSQL, SMVI, and OSSV. This is a trend I expect to see as data becomes larger and restore times longer and data classification.
I gained a lot from this, so thanks again for posting on PSTN!
> Snapmirror to tape has the benefit of offloading the VTL of the deduplication work. Then it will just
> need to do the hardware compression.
> If customer's do Snapmirror to VTL Tape, then use their backup software to clone it to physical
> tape then I hope the dedup savings could be passed on to the tape media???
It would, although what you describe here might be a little tricky to arrange. With the exception of the new(ish) IBM TSM integration I mentioned in a previous post, SM2T typically happens outside of the context of a backup app. TTBOMK, it is most often simply invoked from the filer command line using the "snapmirror store" command, so going back to a backup app to clone the vtape out to a ptape is something I doubt you would bother to do (and probably couldn't in the case of most backup apps). What you would more likely do is write a simple script that ejected the tape containing SM2T data out to the physical library. Our VTL offers an SSH-accessible CLI that has commands you can use to do this, for example the "vlib export <barcode>" command would do it.
> I have a customer project I'm working on where we are removing tape completely from their
> environment using NetApp Snapshots, Snapmirror, and Snapvault, SME, SMSQL, SMVI, and OSSV.
Definately sweet music to my ears, even though I've spent my last couple of years at NetApp in the VTL business unit! :-)
Hi Andrew... I will ask around to see what is officially available re: the integration that now exists between TSM and our FAS systems. I can tell you in a nutshell though that there are basically two components to it.
The first is some specific TSM support for a snapshot differencing API that was made available in relatively recent versions of ONTAP (alas I don't have all the TSM and ONTAP version numbers that support what I'm about to describe available to hand, but I can get them for you if you're interested in the details). As you are probably aware, TSM does its filesystem backups on what it calls a "progressive" basis, which essentially means it never backs up the same (unchanged) file twice, and works largely on an "incremental forever" basis for filesystems. Ergo, the first thing it typically does when it backs-up a filesystem is walk the entire directory structure to figure out which files have changed since the last backup was done, so it can just backup the changed files. Unfortunately this "filesystem walking" operation can take a significant amount of time on big filesystems that contain many thousands, or even millions of files, which can present a problem. The problem is usually observed as a long, pregnant pause before TSM even gets going on writing out a backup.
Cutting a long story story short, it so happens that the WAFL technology in our FAS systems can *really* help out with this problem. Using WAFL, we are able to very quickly and efficiently figure out which files/directories have been changed between two snapshots, certainly much more quickly and efficiently than an unaided host-side backup application can figure this out by walking/scanning the filesystem from the top, and we can "hand out" lists of changed files/directories via an API to anyone who is interested in the subject. This is essentially the first piece of integration we now have with TSM. When TSM is backing up a FAS system, it can have the FAS system figure out what's changed since it last did a backup, and start on dong the actual backup much more quickly than it could if it had to figure this out for itself.
The second piece of integration is the support for SnapMirror-to-Tape (SM2T) "backups" within TSM. It is now possible to have TSM use the SM2T machinery within a FAS system to generate and catalog full backups of FlexVols in the SM2T format. Obviously these backups can also be restored through TSM too, but only on a wholesale basis. There is no granular restore (i.e. it's the full FlexVol or nuthin'!).
Anyway, in a nutshell, that is it! I'll post a pointer to any other material on this subject I subsequently find out about.
I'm really interested in the information you have about the integration of TSM + Netapp. I'm basically trying to find some specific things:
1) For the incremental backups, how is made the comparison between both snapshots? Will it go and open the files in any way or there's no walking/scanning at all? (This relates to the fact that if you have fpolicies in place, when you open the files the fpolicy kicks in and this can have an impact on the performance increasing the time of the scanning)
2) Which "source" and "target" snapshots are compared for the diff incremental? Is it only the last two snapshots or you can have other snapshots in the middle?
For example: you do hourly snapshots from 8 to 17 Hs, your backup starts every day at 18 Hs. You'd want to compare today's snapshot@18Hs with yesterday's snapshot@18Hs.... Is this possible?
3) Which version of Ontap supports this features?
It'd be great if you can point me into the right direction to address this questions.
Thank you very much in advance.
Hi Sebastian.... I just saw your message and can answer two of your three questions off the top of my head, but will need to circle back to this thread on the third.
1), The snapshot differencing function happens internally to WAFL, so no files have to be opened, and no fpolicies should be triggered by the process. Fpolicy events may get triggered when TSM actually gets around to opening and reading the modified files/dirs for the purposes of backing them up though.
2), The snapshots that are compared in the context of TSM are ones that TSM actually creates itself for the purposes of doing these types of backups from a FAS volume. I believe the APIs will actually let you compare any two snapshots to retrieve the list of modified files/dirs in the later of the two, but TSM uses the APIs in a specific way, creating the snapshots it wants comparing itself. In other words, TSM doesn't work on arbitrary pairs of snaps. You could though, if you were writing your own code.
3) This is the one I need to circle back around on! :-) Hopefully you can give me a day or two...