7 Replies Latest reply: Aug 19, 2013 12:35 PM by RPOHLWILM RSS

Over-Commitment - Thin Provisioning (Best Practice)

DAVISADMINS
Currently Being Moderated

Hello all.

 

We are still rolling our NetApp SAN. My question relates to best practices when over-committing in a virtual environment.

 

We currently have 2 aggregates with matching volumes and LUNs that match the aggregate GB for GB (1TB aggregate w/ 1TB volume w/ 1TB LUN).  We are seeing only about 25% utilization of this space despite the fact that the virtual host servers using the LUNs think they have nearly fully used them. We have a stable environment (no expected storage growth) and I would lke to overcommit.

 

Question: Better to add LUNs and overcommit the volume or better to add volumes and overcommitt the aggregate?

 

Thanks for your feedback and any thoughts you may have.

  • Re: Over-Commitment - Thin Provisioning (Best Practice)
    chute
    Currently Being Moderated

    NetApp Storage Best Practices for VMware VSphere says the following.

     

    When you enable NetApp thin-provisioned LUNs, NetApp recommends deploying these LUNs in FlexVol® volumes that are also thin provisioned with a capacity that is 2x the size of the LUN. By deploying a LUN in this manner, the FlexVol volume acts merely as a quota. The storage consumed by the LUN is reported in FlexVol and its containing aggregate.

     

    Information is on Page 16. TR-3749 found here...

    http://media.netapp.com/documents/tr-3749.pdf

    • Re: Over-Commitment - Thin Provisioning (Best Practice)
      HC_ITDEPT
      Currently Being Moderated

      Peter, I'm trying to do the same thing you are, and I've read TR-3749 too, but I'm confused as to why this is the recommendation.  When my engineer was here installing my FAS, he told me that I should put thin LUNs into thin volumes to get max savings, and make the LUNs 2x the size of current data.  So, if I have 500GB worth of VMs, I would make a thin volume with a thin LUN @ 1TB.  What this TR article is saying is that the thin volume should then be 2TB for 500GB worth of data.  Why?

       

      I should mention that I'm going to be really tight on space in the short term until I can get more disks, so the emphasis on everything that I've planned has been to save space w/o shooting myself in the foot. 

       

      The formula I'm thinking of employing is:

      thin LUN = current data x2 rounded to nearest 100GB.

      thin volume w/snapshots = thin LUN x 120%

      thin volume w/o snapshots = thin LUN

       

      Doesn't that make sense?

       

      Using http://media.netapp.com/documents/tr-3483.pdf as a reference (see page 21), I was thinking of having it look like this:

       

      Guarantee = none

      Lun reservation = disabled (Thin prov.)

      fractional reserve = 0%

      Snap reserve = 0%

      autodelete = volume/oldest first

      autosize = off

      try_first (if applicable) = delete snap

       

      I'd appreciate it if one of the pros could weigh in on this.  I'm not a storage admin, so my thinking could be way off,  but if so I have no way of knowing.  Please help!

  • Re: Over-Commitment - Thin Provisioning (Best Practice)
    DAVISADMINS
    Currently Being Moderated

    Although this subject is outside the scope of technical support, I did manage to find someone that was willing to chat with me about this.

     

    This is what I learned. I would welcome thoughts and contradictions on this.

     

    I was stated that it is best to have 1 LUN per Volume (1-to-1 match).

     

    We also discussed the behavior of the LUN in an overcommitted environment

    The NetApp Filer sets a "high water mark" in an overcommitted environment.

    That is to say, it identifies how much space must always be available in the LUN.

    Once a high water mark is hit, I was told that the actual usage from the LUN will not drop below that despite what happens in the LUN.

     

    Example (thin-provisioning):

    A 1 TB Aggregate is created and a 1TB Volume is created with a 1TB LUN.

    After the Server using the LUN thinks it has used nearly all of the 1TB LUN, actual usage may be far less. Let’s say 25% of that (250GB).  Even if storage is freed up in the Server environment and the Server using the LUN thinks it is only using 150GB of the LUN, the “high water mark” in the SAN has already been set and will not change. It will remain at 25% (250GB)

     

    Could someone pipe in on this? This true?

     

    Implications:

    When overcommitting, care must be taken to anticipate growth.

    If an additional Volume and LUN is created on the aggregate to take advantage of thin-provisioning, and for whatever reason space is quickly used up, there is really no way of decreasing the amount of space used in the LUN (remember the "high water mark" will not go down) unless a LUN is deleted/recreated.

     

    Question I still have:

    It’s my understanding that de-duplication WOULD actually reduce the amount of space used in the volume. This true?

    • Re: Over-Commitment - Thin Provisioning (Best Practice)
      chute
      Currently Being Moderated

      "Could someone pipe in on this? This true?"

       

      Yes and no. You are on the right track. But space can be reclaimed and you can decrease the amount of space used in the LUN by using space reclamation built into Snap Drive.

      Technical report 3483 details step by step the items mentioned in your example and implications. It starts on page 4 and is under "LUN Space Usage". It goes on to talk about space reclamation on page 7.

      http://media.netapp.com/documents/tr-3483.pdf

       

       

      "Question I still have: It's my understanding that de-duplication WOULD actually reduce the amount of space used in the volume. This true?"

       

      No, that statement is not true.

      Props to Bob Charlier for the explanation.

      The short answer is that deduplication will still think that data is written to those blocks, just as if the data has never been deleted.  Because of that, deduplication won't give the best results, and won't free up any additional space.


      The reason is, NTFS is controlling the list of files that are on the system, and the list of where the free data blocks are.  The NetApp doesn't have any insight in to this.  Space Reclaimer syncs the Windows/NTFS "free block list" with the NetApp free block list, and that is where you end up with unused space on the storage controller.

      This is an example:

      A file is written and takes x data blocks, and gets one entry in the NTFS "master file table"  table.    ===   The NetApp gets a request to write x blocks of data anywhere it can, and the few blocks of data extra to update the NTFS file system.

      The file is deleted from Windows.

      For speed, the only thing that is changed from the NTFS/Windows side is the master file table entry.   The entry in the MFT gets marked as deleted, and the list of blocks that it was using get added to a list that NTFS keeps about free space on the file system.  ===   On the NetApp, couple of blocks are updated to note the master file table changes.


      The only thing that knows these data blocks are free is NTFS.  The old data is still really present in the same block locations, it hasn't been zeroed out or released, until it is overwritten by other data.

      If you run deduplication at this point, it is going to treat those blocks as used.

      If you run Space Reclaimer, the list of blocks that NTFS thinks is free is sent to the NetApp, and those corresponding blocks are released in the volumes free space pool.

  • Re: Over-Commitment - Thin Provisioning (Best Practice)
    thomas.glodde
    Currently Being Moderated

    you need to take into consideration if you use snapshots or not. usualy we do not provide more lun space than aggregate space so even if luns are written 100% they fit into the aggregate. snapshots can then be put to autodelete to be purged in case of not enough space.

  • Re: Over-Commitment - Thin Provisioning (Best Practice)
    RPOHLWILM
    Currently Being Moderated

    I thought Deduplication was on the Volume, not on the LUN... Am I reading something wrong?

    I had professional services out and I was told:

      Aggregate = all disk of one type and size

      Volume = The largest amount you can present, thin provisioned, dedup on, auto grow on

      LUN = same size as the volume

     

    When a past co-worker set this up originally he said the best way to take advantage of the dedupe was to create a large volume and over provision the luns.

    Since VMware will not know the volume is deduped you need to over provision it for maxim utilization. the theory is that you can have more system drives on the same volume with a loarger dedup saving.

    Example:

      Volume = 5 TB, auto grow on, dedup on

      LUNs = 4 at 2 TB (rough estimate)

    This way all the data on all the LUNs are deduped

More Like This

  • Retrieving data ...