24 Replies Latest reply: Sep 13, 2013 12:36 PM by DNICHOLSON_LVS1 RSS

Space reclamation on Linux?

BOARDADMIN Certified Novice
Currently Being Moderated

We're still seeing issues where we delete files on a LUN, the host knows they're deleted, but the space doesn't get returned to the NetApp system.  In Windows, we can use snapdrive's reclaim feature to get that space back, but it looks like we still don't have the option on Linux Snapdrive 5 (at least it's not an obious choice when you look at the snapdrive commands available).  The only solution we have right now is to create a new LUN and move everything over to it, then destroy the old one. 

 

It looks like the last discussion of this was a few years back, and I wanted to find out if there's been any improvements lately or if this feature request has made it onto a timeline...

  • Re: Space reclamation on Linux?
    madden NetApp Employee Cyclist
    Currently Being Moderated

    I installed a new VM with CentOS 6 (binary equivalent to RHEL6) and mapped up a NetApp LUN via iSCSI and the native space reclamation stuff seems to just work.  Here are some outputs from my test.

     

    From Storage filer is running 8.0.2:
    salt*> version
    NetApp Release 8.0.2 7-Mode: Mon Jun 13 14:13:45 PDT 2011

     

    From Host side it is running RHEL6 level with ext4 formatted device with ‘discard’ flag set:
    [root@sdt-linux-infra2 salt-iscsi-lun0]# uname -a
    Linux sdt-linux-infra2 2.6.32-71.el6.i686 #1 SMP Fri Nov 12 04:17:17 GMT 2010 i686 i686 i386 GNU/Linux
    [root@sdt-linux-infra2 salt-iscsi-lun0]# mount
    /dev/sdd on /mnt/salt-iscsi-lun0 type ext4 (rw,_netdev,discard)

     

    From host empty formatted filesystem shows 156m used:
    [root@sdt-linux-infra2 mnt]# df -h
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/sdd               11G  156M   11G   2% /mnt/salt-iscsi-lun0

     

    From storage side we have 321m used:
    salt*> df -mx /vol/sdt_linux_infra2
    Filesystem               total       used      avail capacity  Mounted on
    /vol/sdt_linux_infra2/    16384MB      321MB    16062MB       2%  /vol/sdt_linux_infra2/


    From host side copy a big ISO image and see now 4643m used:
    [root@sdt-linux-infra2 salt-iscsi-lun0]# cp /mnt/vmware-filer02-ISO/CentOS-6.0-i386-bin-DVD.iso .
    [root@sdt-linux-infra2 salt-iscsi-lun0]# df -m .
    Filesystem           1M-blocks      Used Available Use% Mounted on
    /dev/sdd                 11088      4643      5882  45% /mnt/salt-iscsi-lun0

     

    Check storage side and we have 4769m used, which is expected:
    salt*> df -mx /vol/sdt_linux_infra2
    Filesystem               total       used      avail capacity  Mounted on
    /vol/sdt_linux_infra2/    16384MB     4769MB    11614MB      29%  /vol/sdt_linux_infra2/

     

    From host side delete the file and see used is back to 156m used:
    [root@sdt-linux-infra2 salt-iscsi-lun0]# rm CentOS-6.0-i386-bin-DVD.iso
    rm: remove regular file `CentOS-6.0-i386-bin-DVD.iso'? y
    [root@sdt-linux-infra2 salt-iscsi-lun0]# df -m .
    Filesystem           1M-blocks      Used Available Use% Mounted on
    /dev/sdd                 11088       156     10369   2% /mnt/salt-iscsi-lun0

     

    Check storage side and we have 323m used:
    salt*> df -mx /vol/sdt_linux_infra2
    Filesystem               total       used      avail capacity  Mounted on
    /vol/sdt_linux_infra2/    16384MB      323MB    16060MB       2%  /vol/sdt_linux_infra2/

     

    I did notice some slowdowns if deleting large files because normally a file delete is only updating metadata, and if you have 'discard' set then it results in IOs (and waiting on a response) from  the storage device as well.  I read this preso (http://people.redhat.com/lczerner/discard/files/Performance_evaluation_of_Linux_DIscard_support_Dev_Con2011_Brno.pdf) and after contacting the author he sent back this:

     

    right now, we have two ways of doing discard in ext4. The first one is the online (periodic) discard, which is off by default, from obvious reasons you have already noticed.The second approach is called batched discard and it is done via FITRIM ioctl. The file system itself has to have the support implemented for this ioctl. At this point ext4, ext3, xfs, btrfs and ocfs2 has the support for it.

     

    The batched discard he mentions is available in the fstrim tool.

     

    Hope that helps.

    • Re: Space reclamation on Linux?
      BENJAMIN.BOHEC Novice
      Currently Being Moderated

      This functionnality is very interesting.

       

      It means that SnapDrive is not needed if administrator juste want to use "space reclamation" on Thin Provisionned LUN ?

      It just need a "TRIM ready" Operating System/File System?

       

      Do you have any information about the period of the online discard?

      Do you have the command line to launch a batched discard?

      • Re: Space reclamation on Linux?
        madden NetApp Employee Cyclist
        Currently Being Moderated

        SnapDrive for Windows implemented a space reclamation function that used NetApp API calls to reclaim space.  Veritas Storage Foundation did the same.  The stuff we're talking about with linux (and was originally included in ESX 5.0 before it got pulled) uses SCSI UNMAP to reclaim space.  In those cases it's pure standards so no SnapDrive is required just a program that sends SCSI UNMAP commands that correspond to blocks that can be free'd by the filesystem.

         

        Regarding a "TRIM" ready operating system, no, that's not it either.  TRIM is part of the ATA spec, UNMAP is part of the SCSI spec.  In the case of linux they implemented an abstraction layer that can use TRIM or UNMAP according to what the physical device supports.  NetApp presents SCSI devices so we support UNMAP.

         

        NetApp released TR-4046 titled "NetApp Thin-Provisioned LUNs on RHEL 6.2 Deployment Guide" that talks more about the discard feature in 'online' mode:

        http://media.netapp.com/documents/tr-4046.pdf

         

        For offline mode my understanding is the fstrim tool is in the util-linux-ng package.  This could be put in cron to reclaim space on a daily basis.

         

        I also noticed the hyperlink I posted earlier is different if clicked than printed; here it is again:

        http://people.redhat.com/lczerner/discard/files/Performance_evaluation_of_Linux_DIscard_support_Dev_Con2011_Brno.pdf

         

        Hope that helps.

        • Re: Space reclamation on Linux?
          zmizmizmi Novice
          Currently Being Moderated

          We have VMs on NetApp via NFS, and it doesn't look like the DISCARD operation is supported at all:

           

          # cat /sys/devices/pci0000:00/0000:00:15.0/0000:03:00.0/host2/target2:0:0/2:0:0:0/block/sda/queue/discard_max_bytes

          0

           

          This is kernel 3.3.7 under ESXi 5.0. Is discard working with VMDKs that are used via NetApp-NFS mounts? Or does it need the VAAI feature in VMware?

          • Re: Space reclamation on Linux?
            radek.kubka Hall of Fame F1 racer
            Currently Being Moderated

            Hi,

             

            You need VAAI - more details can be found in Luke Reed's blog posts & comments:

            https://communities.netapp.com/blogs/luke/2011/09/09/vaai-in-vsphere-50-part-1#comments

             

            One tiny problem though - VMware recommends against using SCSI UNMAP, which is disabled by default in ESXi 5.0 U1

             

            Regards,

            Radek

          • Re: Space reclamation on Linux?
            madden NetApp Employee Cyclist
            Currently Being Moderated

            The SCSI UNMAP stuff I mentioned is when the OS is writing directly to a NetApp presented SCSI device.  So if you had a FCP or iSCSI device presented directly to linux then the discard support would be available.  If the SCSI device is actually emulated by VMware then they would somehow have to pass it though to NetApp.  Today this isn't possible natively by VMware.  NetApp do offer reclamation as part of Virtual Storage Console (VSC) for Windows VMs over NFS with the VM in powered off state.  This software looks in the VMDK NTFS filesystem for allocated but free blocks and then calls a NetApp API to UNMAP them.

             

            Hope that helps.

            • Re: Space reclamation on Linux?
              rprudent@metnet CertifiedPlus Sprinter
              Currently Being Moderated

              Hi there,

              I would me mapping a lun using dm for non local scsi redundancy. Will use parted as lun would be more then 2 TB. The mounted lun would be used for Oracle db. From research it looks like fstrim with cron has performance gains than discard mounting.

              Questions

              • I keep seeing trimming recommended for ssd and thin provisioned lun. If allocated lun is guaranteed (space reclamation) should I still use trimming to reclaim deleted blocks
              • Because fitrim is supported by ext3 does this mean that I can do space reclamation batch discarding is suppported with RHEL 5.x  whereas online discarding is supported by RHEL 6 ext4
              • mounted discard devices are not supported on Snapmirrored volumes, is this the same for fstrim devices
              • is there any doc on installing the correct packages for fstrim and using it, a script to use with cron
              • Re: Space reclamation on Linux?
                rishib NetApp Employee Novice
                Currently Being Moderated

                Starting RHEL 6.1, ext4 has implemented batched discard (similar to fstrim) which has significant performance improvements, and should be comparable to fstrim in performance. And given that ext4 discard is automatic unlike fstrim, that should be the recommended mode of operation.

                Questions

                I keep seeing trimming recommended for ssd and thin provisioned lun.  If allocated lun is guaranteed (space reclamation) should I still use trimming to reclaim deleted blocks

                ** I didn't get the question. is it a TP LUN or not? i.e no space reservation (-o noreserve with -e space_alloc)? You should have these options set for using space reclaim.

                 

                Because fitrim is supported by ext3 does this mean that I can do space reclamation batch discarding is suppported with RHEL 5.x whereas online discarding is supported by RHEL 6 ext4.

                ** Not supported in RHEL5. The major changes to the host storage stack including device-mapper to support space reclaim was done only in RHEL6 (preferably 6.1 & above).

                 

                is there any doc on installing the correct packages for fstrim an  using it, a script to use with cron

                **ext4 discard gives you comparable performance.

                • Re: Space reclamation on Linux?
                  rprudent@metnet CertifiedPlus Sprinter
                  Currently Being Moderated

                  Hey,

                  I have to use both RHEL 5.x and 6.x hosts.

                  fstrim is the user interface for ftrim and this approach is batched discard. RHEL supported ext 3 (from kernel v2.6.37) and ext4 (v2.6.36)

                  discard mount option support for RHEL 6.2 and higher with on the fly reclaiming is periodic discard. RHEL supported ext4

                  There are known possible performance impacts when deleting huge files during normal or peak I/O using periodic discard.  This is why I am wondering if to use fstrim to reclaim with cron at off peak times.

                  I see RHEL 5.8 latest kernel is , also termed Update 8, 2012-02-20 kernel 2.6.18-308 which doesn't look like fstrim requirements.

                  Question

                  • Are you saying there is no support for fstrim with RHEL.5.8
                  • The norm is when using TP lun or ssd drive to use trimming. The vol and lun would NOT be TP, I would still want to trim to reclaim space on deletes. I guess this okay
                  • "Not supported in RHEL5. The major changes to the host storage stack including device-mapper to support space reclaim was done only in RHEL6". Is it that I can only use dm and space reclamation only with 6.1 and above
                  • I have to use 5.x and 6.x platforms, for the 6.x  I can use either discard batching or periodic discard but for 5.x I am not sure what available.  any advice for space reclamation for 5.8 when using dm

                   

                  Advice Appreciated, thanks

                • Re: Space reclamation on Linux?
                  rprudent@metnet CertifiedPlus Sprinter
                  Currently Being Moderated

                  Hey,

                  Would space reclamation be required if the LUN is Thick Provisioned (space reserved)?  Regarding the Linux way above and for other common platforms in general.

                  • Re: Space reclamation on Linux?
                    radek.kubka Hall of Fame F1 racer
                    Currently Being Moderated

                    It will not bring any benefits - other than better reporting.

                     

                    In ONTAP 8 if you run lun show -v it displays LUN occupied space: after space reclamation process it will be a more 'accurate' value.

                    • Re: Space reclamation on Linux?
                      madden NetApp Employee Cyclist
                      Currently Being Moderated

                      From a storage efficiency perspective I would agree with Radek, but from a perf perspective spare reclamation also increases aggr freespace.  And more aggr freespace means more flexibility during write allocation.  So even in a thick config running space reclamation might provide a benefit...

                      • Re: Space reclamation on Linux?
                        radek.kubka Hall of Fame F1 racer
                        Currently Being Moderated

                        Space reserved LUN 'takes' always the same amount of space from the volume (and from aggregate, if volume is thin provisioned)

                        • Re: Space reclamation on Linux?
                          madden NetApp Employee Cyclist
                          Currently Being Moderated

                          Actually, after thinking more about this from a capacity perspective if using snapshots and the block is unmapped it no longer needs an overwrite reservation.  So reclaiming on a space reserved LUN would reduce the 'used' space in the volume, and that space could potentially be used for more snapshots.

                           

                          From a performance perspective the more freespace (meaning unallocated blocks, basically the 'used' column from the aggr show_space cmd) in the aggr the better WAFL can place data during write allocation; longer chain lengths, more full stripes, less cpreads, etc.  So freeing up blocks on disk that are no longer needed could improve write allocation decisions and reduces disk IO.

                           

                          Neither are huge wins but they are reasons it might be interesting to do space reclamation on space reserved LUNs.

                          • Re: Space reclamation on Linux?
                            rprudent@metnet CertifiedPlus Sprinter
                            Currently Being Moderated

                            Reason why I asked is because I have some volumes and LUNs respectively mapped to RHEL for oracle db mounts. vol options: nosnap=on , FR=0 , snap reserv0, snap shced0, no plans for snap shots. Both vol and LUNs are Thick Provisioned. The volumes are 90% capacity, (10% free) a little vol buffer for perf. There is much fragmentation in the mounts by archiving and deletions etc.

                            After 1 month of functioning, some of the volumes usable capacity  have increased by 8 to 16MB, noticeable on the MB usage of smaller vols. One vol capacity even went up 1% so 91% capacity now. Overall that made the usable capacity on the aggr go up 1% as well.

                             

                            I am aware of 8.x lun capacity usage functionality. See occupied size below for the LUN in the vol that increased usable capacity by 1%.

                            lun show -v

                                    /vol/vol1/ln1.ln            41.0g (44025511936)   (r/w, online, mapped)

                                            Serial#: -dtiK]BbhAUI

                                            Share: none

                                            Space Reservation: enabled

                                            Multiprotocol Type: linux

                                            Maps: XXX=80

                                            Occupied Size:    3.3g (3559153664)  

                                            Creation Time: Tue Sep 18 14:08:42 BOT 2012

                                            Cluster Shared Volume Information: 0x0

                             

                            The mount for this lun has 35.9GB free and 2.1GB used from OS (38GB in all), which may add up if you take into consideration partitioning and formatting overheads but is still 3g difference.

                            I am wondering if space reclamation used with Thick Provisioning would stop slowly growing vol capacity. Why would the volumes increase capacity when everything is Thick Provisioned with no snaps and the mentioned vol and LUN options?

                            • Re: Space reclamation on Linux?
                              madden NetApp Employee Cyclist
                              Currently Being Moderated

                              WAFL always writes to new locations on disk and the old locations (if not locked by snapshot, shared by dedupe or clone, etc) are freed in a lazy manner.  So you might see small increases followed by decreases when the old locations are freed if your IO pattern consists of overwrites.  These increases and decreases might be measured in 100s of MBs, and are seen over 10s of seconds, but the average capacity usage should be flat over time.  If you have thin LUNs, and/or use snapshots, and the system is accepting initial writes (i.e. not overwrites) you might see it growing over time because the consumed blocks are still initial allocations.

                               

                              I hope the above helps to explain the behavior you might see.

                               

                              If you have more questions, I'd suggest including output of "df -r", "df -S", "aggr show_space", "lun show -v" for the LUN/vol in question.

                              • Re: Space reclamation on Linux?
                                rprudent@metnet CertifiedPlus Sprinter
                                Currently Being Moderated

                                This makes sense but in this situation shouldn't all current writes be initial as there is are no snap copy creations on the vol containing the lun and no FR allocations for overwrites so not sure why available seems to be decreasing? 

                                Initial usage

                                /vol/vol1/            45GB   41GB 3928MB  91%  /vol/vol1/
                                /vol/vol1/.snapshot    0TB    0TB    0TB ---%  /vol/vol1/.snapshot

                                 

                                After a month

                                Filesystem               total       used      avail       capacity  Mounted on

                                /vol/vol1/                45GB       41GB     3912MB      92%  /vol/vol1/

                                /vol/vol1/.snapshot        0TB        0TB        0TB     ---%  /vol/vol1/.snapshot

                                Filesystem           total   used  avail        reserved            Mounted on
                                /vol/vol1/            45GB   41GB 3912MB    0MB  /vol/vol1/
                                /vol/vol1/.snapshot    0TB    0TB    0TB           0TB  /vol/vol1/.snapshot





                                As you can see the available has dropped by 16MB and the capacity up by 1%. Another vol of same vol/LUN sizes have similar small MB decrease in avail.

                                I set FR to 0 about 1 week ago, it was default 100 but again no snap configurations is on the vol

                                The lun on this vol1

                                /vol/vol1/ln1.ln            41.0g (44025511936)   (r/w, online, mapped)

                                                Serial#: -dtiK]BbhAUI

                                                Share: none

                                                Space Reservation: enabled

                                                Multiprotocol Type: linux

                                                Maps: XXX=80

                                                Occupied Size:    3.3g (3559153664)  

                                                Creation Time: Tue Sep 18 14:08:42 BOT 2012

                                                Cluster Shared Volume Information: 0x0

                            • Re: Space reclamation on Linux?
                              radek.kubka Hall of Fame F1 racer
                              Currently Being Moderated

                              do you mean vol that increased used capacity by 1%?

                               

                              I wouldn't be too worried about this, as in reality it may be the difference between 90.49% & 90.51% utilisation due to some extra metadata. Looking at df vol_name output would give more accurate figures.

                               

                              With regards to occupied size vs. used capacity from the OS side - space reclamation should bring these two values closer to each other, but I am not convinced about practical benefits of it (FR is set to 0 in your case)

                          • Re: Space reclamation on Linux?
                            radek.kubka Hall of Fame F1 racer
                            Currently Being Moderated

                            yes, correct - if fractional reserve is higher than 0, then space is reserved only for used blocks, so it can make a difference.

                            • Re: Space reclamation on Linux?
                              rprudent@metnet CertifiedPlus Sprinter
                              Currently Being Moderated

                              So if space reclamation is not used for the space reserved lun with the mentioned configs of no snaps and FR 0 etc, then it should only be a matter of wrong occupied size lun reporting on storage side even though you can continue defraging, deleting and re using the blocks on the host end comfortably without chewing into the remaining vol avail space of AFS?

                               

                              If fractional reserve is more than 0 and all snap settings disabled as mentioned, would it still allocate in vol to reflect used space in the lun?

                               

                              Yes i meant vol increased used capacity by 1%. But because it was initially set at 91% seems near enough growing to 92% already even though in MB. If this continues which I hope not, may have to do away with aggr snap settings as well as no sync mirror or metrocluster to increase all vols avail and decrease capacity to safer 80% and still keep current 5% free in aggr.

                              • Re: Space reclamation on Linux?
                                madden NetApp Employee Cyclist
                                Currently Being Moderated

                                Some definitions (also see pdf page 256 onward from the Storage Management Guide for Data ONTAP 8.1.1 which is rewritten from years past and explains this topic pretty well):

                                o) Any kind of reservation or guarantee is for capacity but not specific on disk locations.  The specific locations are determined during write allocation when the write actually hits the system

                                o) Fractional reserve is the % to reserve for overwrites when using snapshots.  If you aren't using snapshots then fractional reserve is not used.

                                o) LUN reservation (on or off) is used to reserve the size of the LUN up front in the volume.

                                o) Volume guarantee=volume is used to reserve the size of the volume up front from the aggregate. 

                                o) The occupied capacity of the LUN is the actual fill rate of the LUN from the host perspective.

                                o) The used column entry for the volume from "aggr show_space" is the actual occupied capacity of the volume in the aggregate.

                                 

                                I believe your config is:

                                o) LUN with LUN reservations enabled

                                o) Volume with guarantee=volume

                                o) No snapshots on the volume

                                 

                                In this case you will see the volume used capacity (df output) will equal to roughly the size of the LUN.  This is because the LUN reservation is reserving space in the volume for the LUN size.  As your LUN is written to you are consuming your reservation.  If you check the "aggr show_space" used column for the volume you will see the value will equal roughly the size of the occupied space of the LUN.  If you check the "aggr show_space" allocated column for the volume you will see the value is roughly equal to the volume size.  So the LUN is reserved in the volume, and the volume is guaranteed in the aggregate. 

                                 

                                Writes will never fail to this LUN.  Even if the volume is 100% full and df shows the volume has  0 kb free the LUN will still accept writes because it is reserved in the volume, and the volume is guaranteed in the aggregate.  Please just ignore the volume df output that is 16MB more than a month ago, as I mentioned earlier small fluctuations are normal.

                                 

                                In your config the only benefit of using space reclamation would be to free up unneeded blocks in the aggregate making write allocation easier; so a potential performance boost.  That boost may be nil however if you already have lots of attractive (meaning plentiful and contiguous) freespace in your aggregate and the reclaim is freeing little.

                                • Re: Space reclamation on Linux?
                                  rprudent@metnet CertifiedPlus Sprinter
                                  Currently Being Moderated

                                  madden, it seems more logical now.

                                  Using the space reclaim in this instance would have better reporting from an efficiency perspective at the file layer but also reflect simultaneously back to the aggr layer which in turn may provide more reported free space for perf. This amount may be minimum or unjustified if there are already approved guarantees at each layer and also if the aggr has recommended free buffers.

                                  Maybe, with the layered config in mention, if using many files with many fragmentation and ROC and very slim aggr buffer; from a perf perspective it may be justified,  otherwise it seems better suited for traditional use without space reservations.

  • Re: Space reclamation on Linux?
    vladimirzhigulin Cyclist
    Currently Being Moderated

    If you're running XFS, RHEL 64 (and CentOS at some point) has now support for "Online Discard".

    https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html-single/6.4_Release_Notes/index.html#storage

    I'm going to try it, keen to see if there are any permanence penalties ..

     

    Vladimir

  • Re: Space reclamation on Linux?
    DNICHOLSON_LVS1 Novice
    Currently Being Moderated

    Does anyone know if fstrim would work on a thin-provisioned LUN (to a RHEL 6.2 NetBackup Server) being presented from a fas2050 7.3.6p5 7-Mode?

     

    IF I'm reading the TR correctly, discard is only compatible with 8.1 or above... the 2050 can't be upgraded.

     

    Our customers NBU server is reporting different used space than the NetApp and their fear is if the aggr/vol/lun finally reports full, backups won't run.

     

    Thanks!

More Like This

  • Retrieving data ...

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points