28 Replies Latest reply: Nov 1, 2013 2:42 PM by billshaffer RSS

please explain to me the snap list on the source filer in snapmirror

netappmagic
Currently Being Moderated

The vol1 is 100% full again. My question is not about sovling volume full issue. but, more about the snapmirror in a more granular way. Please see the following outputs on the source filer "filer2". It seems to me that the size of the snap caused by the snapmirror is 1911GB, the rest of the space is taken by the volume (FS) itself. Could anybody please explain to me in detail about the output of "snap list vol1"?

-  what does exactly the snap include, complete copy of volume1, and plus all snapshots since the first full copy? Why do I have to continue to leave the full copy on the source filer, after it has already copied over to DR site?

- has this listed snap alredy been copied to drfiler1? or just the full sets of snapshots?

- is there any way to list the data in detail as to what is the full copy of volume and what are those snapshots, when were those snapshots taken individually?

 

thanks for your help!

 

filer2> df -rg vol1
Filesystem               total       used      avail   reserved  Mounted on
/vol/vol1/       8193GB     8148GB        0GB        0GB  /vol/vol1/
/vol/avol1/.snapshot        0GB     1911GB        0GB        0GB  /vol/vol1/.snapshot
filer2> snap list vol1
Volume vol1
working...

  %/used       %/total  date          name
----------  ----------  ------------  --------
23% (23%)   23% (23%)  Oct 09 06:00  drfiler(0151735037)_vol1.806 (snapmirror)

  • Re: please explain to me the snap list on the source filer in snapmirror
    aborzenkov
    Currently Being Moderated

    Snapshot unique data contains blocks that were deleted or overwritten since snapshot had been taken. I do not think you can see whether snapshot was fully transferred in snap list output - you need to check snapmirror status. Although indirectly when SnapMirror transfer is in progress, you see two snapshots on source - so we may assume transfer finished (but successfully or not we do not know).

    • Re: please explain to me the snap list on the source filer in snapmirror
      netappmagic
      Currently Being Moderated

      Okay. Where could you see "two snapshots" on source? I  only see one.

      The snapmirror was established a month ago. I can olny see one line as the  result of  "snap list".  So, does this snap with 1911GB include all changes/owverwritten since first initialization?

      • Re: please explain to me the snap list on the source filer in snapmirror
        aborzenkov
        Currently Being Moderated

        Snapshot by definition cannot include any change "since". It's content is frozen at the moment snapshot is created.

        • Re: please explain to me the snap list on the source filer in snapmirror
          netappmagic
          Currently Being Moderated

          Okay. Understood. Thanks for clearifications.  I am ursorry, but still could not fully understand my questions in mind. This snapmirror has been scheduled once every half hour on the destiantion, and it seems working by using"snapmirror status".

           

          So, the following line by "snap list" contains every single snapshot (understood it is frozen image copy) since first initialization? and that's why it is so big, with the size of 1911GB? if all snapshots have already transferred to the destination ( I guess this is how the destination keeps the original copy and all changes being made on the source), why do we need to keep all these snapshots on the source?

           

          Thank for your patience.

           

          filer2> snap list vol1

          Volume vol1

          working... 

            %/used       %/total  date          name
          ----------  ----------  ------------  --------
          23% (23%)   23% (23%)  Oct 09 06:00  drfiler(0151735037)_vol1.806 (snapmirror)

          • Re: please explain to me the snap list on the source filer in snapmirror
            aborzenkov
            Currently Being Moderated

            SnapMirror does not keep all snapshots. It needs only one - latest - snapshot as baseline. During next update it creates new snapshot and transfers difference between baseline and current snapshot. After that old baseline is removed and last transferred snapshot becomes new baseline.

          • Re: please explain to me the snap list on the source filer in snapmirror
            billshaffer
            Currently Being Moderated

            You are incorrect assuming the snapshot shown in "snap list" contains every single snapshot since initialization.  On initialization, a "base" snapshot will be taken at the source, and that point-in-time data is copied over to the destination volume.  This initialization data includes all existing snapshots, including the base that was just taken.  At this point you've got a "Snapmirrored" relationship.  Remember that, at the time of creation, a snapshot takes no real space, since it just contains pointers back to the original data.  Only as that original data changes are the snapshot blocks used.

             

            When you update the existing relationship, a "differential" snap is taken at the source.  These two source snaps plus the destination snaps (copied over during the last transfer) are used to compute the data that has changed since the last transfer.  These differences are copied over to the destination volume - including the snap just taken - and then the base snapshot at the source is removed, since the newer snapshot now has the point-in-time reference image, and it not becomes the base for the next update.  This is what aborzenkov is referring to when he says that you see two source snapshots when a transfer is in progress.

             

            So your 1911G snapshot essentially contains the _changes_ to the volume between when the snapshot was taken and now -_not_ the changes since initialization.

             

            You say that the snapmirror has been scheduled for every 30 minutes - but this snapshot is from Oct 9.  The date of the snapshots will update with a successful transfer (and the size will also go down), so I think something is wrong with the schedule.  What does "snapmirror status" show as the lag time?  Will you post the "snap list" output of the destination volume too?

             

            Bill

            • Re: please explain to me the snap list on the source filer in snapmirror
              netappmagic
              Currently Being Moderated

              Hi Bill,

               

              Thank you so much for such detailed explanations which cleared out quite confusions in my mind.

              As you indicated, there should be something wrong with the snapmirror, and now I feel the size of the snapshot should not be so big (1911GB). The total of the volume size is about 8TB.

               

              I am sorry, but I have not accurately stated my situation:

              -     a)                 the snapmirror for this volume is scheduled as following, not every half hour as I said earlier:
              netapp2:vol1 drfiler1:vol1 - 0-59/59 * * *

              How to explain this schedule? Does the update start every hour based on this schedule? Maybe this schedule caused the problem of that every month or so the volume gets full?

               

              -     b)    The output of snap list you see here  was up to 10/09 when the volume got full, and  I therefore broke  the snapmirror off on that day.

               

              The following are outputs you ask for, and again it was up to 10/09:

              drfiler1> snap list vol1

              Volume vol1

              1. working...

               

                %/used       %/total  date          name

              ----------  ----------  ------------  --------

                0% ( 0%)    0% ( 0%)  Oct 09 06:00  drfiler1(0151735037)_vol1.806

                0% ( 0%)    0% ( 0%)  Oct 09 05:59  drfiler1(0151735037)_vol1.805

               

              drfiler1> df -rg vol1

              Filesystem               total       used      avail   reserved  Mounted on

              /vol/vol1/       8193GB     8135GB       57GB        0GB  /vol/vol1/

              /vol/vol1/.snapshot        0GB        0GB        0GB        0GB  /vol/vol1/.snapshot

               

              drfiler1> snapmirror status vol1

              Snapmirror is on.

              Source                 Destination              State          Lag        Status

              netapp2:vol1  drfiler1: vol1  Broken-off     126:33:39  Idle

               

              Thanks again for your patience.

              • Re: please explain to me the snap list on the source filer in snapmirror
                billshaffer
                Currently Being Moderated

                0-59/59 would, to me, indicate that it's going to update the 59th minute of every hour, though it's kind of a convoluted way to get there.  But the fact that your destination snapshots are a minute apart seems to say it's going every minute, which is a bit extreme.  You should be able to see in /etc/messages and/or /etc/log/snapmirror what it's trying to do and what it's saying.

                 

                Try changing your schedule to 59 * * * (which will run at 1 minute before each hour), resync the relationship, and see what happens.  You will need to grow the source so it can write a new snapshot, but after the sync it should remove that 1911G one.

                 

                Bill

                • Re: please explain to me the snap list on the source filer in snapmirror
                  netappmagic
                  Currently Being Moderated

                  Got your point.

                  I could not grow the source, since the aggr where the  volume is located is completely full, I could not grow the source.

                  Could I remove the 1911G first? it seems to me that this snapshot maybe already corrupted. If I could, then do I have to reinitialize or I could resync?

                  • Re: please explain to me the snap list on the source filer in snapmirror
                    billshaffer
                    Currently Being Moderated

                    The snapshot itself can't be corrupted.  It can point to data that is corrupted somehow, but that is not the snapshot's fault.

                     

                    You can remove the existing snapshot, but then you'll have to reinitialize, which will do a full data transfer.  But if that's your only option, then you're kind of stuck.  Any chance of removing some of the "live" filesystem?  temp files or something?

                     

                    Bill

                    • Re: please explain to me the snap list on the source filer in snapmirror
                      netappmagic
                      Currently Being Moderated

                      Okay. One thing I am not so sure of. Would it be possible the snapshot takes so much space, 1911GB? How to explain the snapshot is so big? It seems not likely that there are so much data being changed between two snapshots.

                      • Re: please explain to me the snap list on the source filer in snapmirror
                        billshaffer
                        Currently Being Moderated

                        Snapshot space is determined solely by change rate.  2TB change in 8TB in 5 days _seems_ a bit high, but like I said before - barring a bug or something, a snapshot just can't be corrupt - the fact that it is that big pretty much tells you that that much data has changed.  A lot depends on the application.  I've seen databases chew up snapshot space pretty quickly.  If you're doing luns, and do a format on the host, that is all going to show as changed data, too.

                         

                        And remember - the size represents the data change on the live filesystem between now and when the snapshot was taken - the multiple shapshots are only used to compute what needs to be sent to the destination.

                         

                        Bill

                        • Re: please explain to me the snap list on the source filer in snapmirror
                          netappmagic
                          Currently Being Moderated

                          Hi Bill,

                          There is one issue left in my mind.

                          >So your 1911G snapshot essentially contains the _changes_ to the volume between when the snapshot was taken and now -_not_ the changes since initialization.

                           

                          We have the schedule of updating the volume in every minute on the destination(I know now it is too extrem),  as I understand, this update would meanwhile also trigger the snapshot. So, this 1911GB was the result of snapshot being taken in a minute! This size of data changes in a minute seems impossible.

                           

                          >2TB change in 8TB in 5 days _seems_ a bit high

                          I don't know where you got "5days" from?

                           

                          At this point, I should tell you what is the volume for.  This volume is  presented to Window server as a share, and is used for Acronis backup. 2 weeks retention, full backup in 2 Sunday, and incrementals in any other days.  Does this tell you something?

                           

                          Thank you!

                          • Re: please explain to me the snap list on the source filer in snapmirror
                            aborzenkov
                            Currently Being Moderated

                            Your schedule is every hour, not every minute. And even in this case next scheduled SnapMirror transfer is skipped if previous one did not complete.

                             

                            Backup applications can delete large amount of expired data in short time.

                            • Re: please explain to me the snap list on the source filer in snapmirror
                              aborzenkov
                              Currently Being Moderated

                              In your original mail snapshot is dated Oct 09. Your original mail (post) was made Oct 14. That’s 5 days, not one hour. You need to check whether your snapmirror is running.  Billshafer already told you that.

                              • Re: please explain to me the snap list on the source filer in snapmirror
                                netappmagic
                                Currently Being Moderated

                                No, it is not 5 days. I explained that already.

                                Though I post the message on 10/14, the snapmairror had already broken off by me on 10/9 because the volume was full on10/9. so, all outputs were refrection of situation before 10/9.

                                • Re: please explain to me the snap list on the source filer in snapmirror
                                  netappmagic
                                  Currently Being Moderated

                                  Hi aborzenkov,

                                   

                                  >Backup applications can delete large amount of expired data in short time.

                                  Does that mean it could cause such large size of snapshot (1911GB) in an hour due to the deletion of the large amount of expired data? I thought if the data is gone, we don't need to track these data, and then should not have such amount of the snapshot. I am still trying to logically explain why we have such large size of  snapshot in an hour(thanks for pointing this out).

                                   

                                  Please let me know. Thank you!

                                  • Re: please explain to me the snap list on the source filer in snapmirror
                                    billshaffer
                                    Currently Being Moderated

                                    netappmagic:

                                     

                                    You're saying that the output you've posted is from commands run Oct 9th, not yesterday?  If so, I've misunderstood; I thought those were current figures.

                                     

                                    The bottom line, though, is that whatever size the snapshot is (or was), is the amount of data that has been overwritten/changed.  Period.  Backup applications are known for high change rates.  If your snapshot is that big, it means yes, your backup application is changing that much data.  As far as tracking the data - once the changes have been replicated, the large snapshots will be deleted.  But that replication has to happen for the snapshots to cycle, so until then the changes are still "tracked" in the snapshot.

                                     

                                    Does that make sense?

                                     

                                    Bill

                                    • Re: please explain to me the snap list on the source filer in snapmirror
                                      netappmagic
                                      Currently Being Moderated

                                      Hi Bill,

                                       

                                      Yes, it does make sense. I guess, I would have to accept this size of the snapshot.

                                      I hvae two more quick question:

                                      a) I would have to remove the snapshot, and let the backup going, then reinitialize the snapmirror, since I don't have any space left in aggr. right?

                                      b) If I add FC drives into this SATA aggr, would that be alright, any performance issues?

                                       

                                      Thank you!

                                      • Re: please explain to me the snap list on the source filer in snapmirror
                                        billshaffer
                                        Currently Being Moderated

                                        Without growing the aggr, yes, you would either have to remove the snapshot and reinitialize, or remove volume data, which should let you resync.  If you've got some backup data that is close to expiry, could you expire that out a bit early?

                                         

                                        Having different drive types/sizes in an aggr is generally frowned upon; you can introduce performance issues.  That being said, I've see it done, and adding faster disk to slower disk is probably better than adding slower disk to faster disk....  I would avoid it, if you can, but it is certainly possible.

                                         

                                        Bill

                                        • Re: please explain to me the snap list on the source filer in snapmirror
                                          netappmagic
                                          Currently Being Moderated

                                          Hi Bill,

                                           

                                          I need turn back to you again for your help.

                                          to continue on our conversation about the size of the snapshot which is produced by snapmirror, the issue is that the size of the snapshot will be gradually increased, as more data gets removed from this 8TB volume. If I frequently use df -rg volume, the size is getting larger and larger, after I just removed 1TB data. Why? Eeven though I break off the snapmirror on the DR site, the size of snapshot (on the source) is still increasing. and if I do" snapmirror status volume"  the status is showing me "transferring". Why? I thought if I broke off the snapmirror, transferring should be stopped.

                                          • Re: please explain to me the snap list on the source filer in snapmirror
                                            billshaffer
                                            Currently Being Moderated

                                            The source snapshot is used by snapmirror, but it still a snapshot - a list of pointers to unchanged blocks, and a bunch of blocks that have changed since the snapshot was taken.  The source snapshot will continue to track changes to the volume whether or not snapmirror is active.  In addition, when you delete a large amount of data, the process that "transfers" that data to the snapshots (block reclamation) is not instantaneous - so it's expected to see the snapshot continue to grow for a while after a delete.

                                             

                                            If the relationship is broken, transferring is stopped.  However, if the relationship is still defined in snapmirror.conf, it will continue to attempt to resync and error out.  It could be during this attempt-error cycle that you see the "transferring" state.  It's also possible that, if you did a break without doing a quiesce, you broke it in the middle of a transfer and its status is "stuck" on the source.  I've seen a couple different scenarios where "snapmirror status" said "transferring" when it clearly couldn't be.  Bottom line is if the destination 'snapmirror status" says Broken-off, then no transferring is going on.

                                             

                                            Bill

                                            • Re: please explain to me the snap list on the source filer in snapmirror
                                              netappmagic
                                              Currently Being Moderated

                                              >In addition, when you delete a large amount of data, the process that "transfers" that data to the snapshots (block reclamation) is not instantaneous -

                                              can I understand this sentence as following: Snapshot is not only keep track of data that just added, also data that just removed, and therefore I will see just removed data will be "transferred" to snapshot area, and then that's why I will see snapshot space is growing?

                                               

                                              I did quiesce the snapmirror first before break off. However, the relastionship is still defined in snapmirror.conf file. As you said, that's why the "transferring" would be still going on, but eventually error out?

                                               

                                              Thank you very much for staying with me for so long.

                                              • Re: please explain to me the snap list on the source filer in snapmirror
                                                billshaffer
                                                Currently Being Moderated

                                                If the snapmirror is still scheduled, it will try to kick off on schedule, but will error out pretty quickly with something like "not in a snapmirrored relationship."  You should be able to piece the sequence together from logs, but I think for the full picture you need /etc/messages and /etc/log/snapmirror from both the source and the destination.  Which side are you seeing the "transferring" state on?  If you comment out the entry in snapmirror.conf, it will stop trying to run.

                                                 

                                                Snapmirrors don't really track added data.  When you take a snapshot of a volume, you're creating a point-in-time image of that volume.  This works by creating a bunch of pointers in the snapshot to all the allocated blocks in the volume - not really taking any space at this point, because it's all pointers.  As new data is added to the volume, new blocks are allocated.  The snapshot is unaware of this.  When data gets changed/deleted, the new data still gets allocated to new blocks (and the snapshot is still unaware of this new data), but the pointers in the snapshot that pointed to that changed data still exist - now the blocks get deallocated from the volume (since they are no longer valid in the live filesystem), and allocated to the snapshot.  The data doesn't move - it's still on the same physical block - but now it "belongs" to the snapshot, not the volume.

                                                 

                                                The way snapmirror can use the snapshots to know what new/changed data needs to be replicated is by taking that second snapshot, and comparing the two.

                                                 

                                                Does that make sense?

                                                 

                                                Bill

                                                • Re: please explain to me the snap list on the source filer in snapmirror
                                                  netappmagic
                                                  Currently Being Moderated

                                                  Hi Bill,

                                                   

                                                  I have read your message a few times. based on my understanding, I honestly still don't understand why the snapshot space is getting big and big, and the following number 2320GB (bold) is climbing again and again for about half hour, and reached as much as 3219G before I had to delete it by using snap delete command.

                                                  source> df -rg vol1

                                                  Filesystem               total       used      avail   reserved  Mounted on

                                                  /vol/vol1/       8193GB     8148GB        0GB        0GB  /vol/vol1/

                                                  /vol/vol1/.snapshot        0GB     2320GB        0GB        0GB  /vol/vol1/.snapshot

                                                   

                                                  So, it did not error out quickly, and I am not sure if it is the result of scheduled resync in /etc/snapmirror.conf, because the number immediately started to climb as soon as we deleted 1TB amount data.

                                                   

                                                  that "transferring" is on the source side when I run "snapmirror status" on the source volume

                                                   

                                                  there are "vol1 is full" messages in /etc/messages file, also about destination volume is full, could not make transfer. The only type of messages in /etc/log/snapmirror is about DR volume is full, and could not make the transfer. So, both volumes were full.

                                                   

                                                  Another basic question, please forgive me, is 2320GB here really a total amount of data that all snapshot pointers  point to, not the amount of space that these pointers occupy, right? because ponters won't now occupy so much space. If right, then it means once 1TB data got removed, then growing snapshot pointers are starting to point to these removed blocks, therefore the amount of data that pointers point to is getting big and big?

                                                  • Re: please explain to me the snap list on the source filer in snapmirror
                                                    billshaffer
                                                    Currently Being Moderated

                                                    Pointers in snapshots take almost no space - you can see this be creating a snapshot of a large volume, and seeing with df and snap list that is has no real size.  In this case, 2320G is the total space of the CHANGED blocks pointed to by the snapshot pointers - data that has not changed in the live filesystem is still just pointed to, and still takes no space in the snapshot.  Yes - when you delete/change data in the live filesystem, the snapshots grow in size.  This is normal.

                                                     

                                                    I'm not sure what you mean by "So, it did not error out quickly, and I am not sure if it is the result of scheduled resync in /etc/snapmirror.conf, because the number immediately started to climb as soon as we deleted 1TB amount data."  You need to decouple (in your mind) the snapshot growth from the snapmirror - they are really unrelated.  Snapmirror will use snapshots, taking new ones and removing old ones, to determine what needs to be replicated, but the snapshots are really an independant entity.

                                                     

                                                    So, "it did not error out quickly" - if, in fact, you have broken your snapmirror, the scheduled sync will fail.  It will try again on the same schedule, and fail again.  If the volumes are full, it may error out on that before discovering that the relationship is broken.

                                                     

                                                    "the number immediately started to climb as soon as we deleted 1TB amount data" - as I said, this is expected snapshot behavior.  Your original snapshot was 1911G.  You deleted 1000G.  Thus, I would expect the snapshot to grow to 2911G, more if there has been more change to the live filesystem (which we can assume, given the change rate observed earlier).

                                                     

                                                    Bill

More Like This

  • Retrieving data ...