10 Replies Latest reply: Jan 28, 2013 3:09 AM by richard_mackerras RSS

Files Causing Misaligned IO's (nfsstat -d) and mbrscan don't agree

fletch2007
Currently Being Moderated

Hi - we run a daily report using the mbrscan utility to check all vmdk files for alignment.

 

Recently (> 7.3.5?) Netapp added an nfsstat -d switch to report "Files Causing Misaligned IO's"

 

I am finding mbrscan is reporting vmdk's aligned: Yes

but nfsstat -d is reporting the same file's misaligned IO counter increasing

 

I zero'ed out the stats with nfsstat -z to be sure, and yes, the counters are increasing several 1000 between nfsstat -d runs (about 1 minute apart) in the case mcomm below

 

root@backup-02 mcomm]# /opt/netapp/santools/mbrscan *flat*vmdk

--------------------

mcomm_1-flat.vmdk p1 (EBR )    lba:64    offset:32768    aligned:Yes

mcomm_1-flat.vmdk e1 (NTFS)    lba:128    offset:65536    aligned:Yes

--------------------

mcomm-flat.vmdk p1 (NTFS)    lba:64    offset:32768    aligned:Yes

 

 

nfsstat -d output:

 

Files Causing Misaligned IO's

[Counter=3404], Filename=vm65/mcomm/mcomm-flat.vmdk

 

Which tool is correct?

 

FWIW - the Partial Write over limit (pwol) counter is not increasing:

http://www.vmadmin.info/2010/07/quantifying-vmdk-misalignment.html

 

Also nfsstat -d lists a record without a filename - how do I determine what this is ?

 

Files Causing Misaligned IO's

[Counter=4093], FSID=95966634, Fileid=21607809

 

 

thanks

  • Files Causing Misaligned IO's (nfsstat -d) and mbrscan don't agree
    chrism@mochadata.com
    Currently Being Moderated

    Bump.

     

    We're seeing the exact same results as above on our end as well. nfsstat -d is flagging Windows 2K8 R2 boxes as generating misaligned IO whereas mbrscan says that everything is aligned. Being that I know that the 2008 R2 boxes are ok and that mbrscan is giving the results that I would expect, I'm not sure how to read or trust the nfsstat output.

  • Files Causing Misaligned IO's (nfsstat -d) and mbrscan don't agree
    fletch2007
    Currently Being Moderated

    Well the Netapp engineer not so helpfully just emailed a link - completely ignoring the nfsstat -d output:

     

    Please refer to the following knowledge base article link which shows how to identify and fix misaligned Windows Virtual Machine disks in your environment:

    https://kb.netapp.com/support/index?page=content&id=1011402

    Please let me know if you need any further assistance in this regard.

     

    I just came across a "soon to come" teaser from http://www.vmdamentals.com/

     

    "The devil is in the details: How aligned VMs may still be misaligned"

    Sounds like our issue...

    • Re: Files Causing Misaligned IO's (nfsstat -d) and mbrscan don't agree
      Currently Being Moderated

      you can use vol read_fsid to find out which flexvol has that FSID

       

      netapp01*> vol read_fsid customer01_oralogtemp

      Volume 'customer01_oralogtemp' has an FSID of 0x17383d29.

       

      Plus, you can have aligned VMs, but still have applications that generated non-aligned writes.

      the example below is for an oracle database on NFS, all writes are aligned (it's NFS), but the writes to the redo logs can be for any size between 512b to ..... In this case, the writes can fall through a block boundary.

       

       

      Files Causing Misaligned IO's

      [Counter=0], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogB/log_g12m2.dbf

      [Counter=44876], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/origlogA/log_g11m1.dbf

      [Counter=36977], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogA/log_g11m2.dbf[Counter=85835], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogB/log_g14m2.dbf

      [Counter=100507], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/origlogB/log_g14m1.dbf

      [Counter=45690], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/origlogB/log_g12m1.dbf

      [Counter=102382], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/origlogA/log_g13m1.dbf

      [Counter=89142], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogA/log_g13m2.dbf

      [Counter=38241], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogB/log_g12m2.dbf

      [Counter=3373], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/mirrlogA/log_g11m2.dbf

      [Counter=3467], Filename=customer01_sapbin/zone_test5_P01_oracle/P01/origlogA/log_g11m1.dbf

      • Re: Files Causing Misaligned IO's (nfsstat -d) and mbrscan don't agree
        fletch2007
        Currently Being Moderated

        Yes, so the GOS is aligned, but some operations may not be aligned (like Oracle logging)

        Other Questions:

        1) How do we guage the relative  significance of this unaligned IO?

        2) Why are there multuple counters listed for the same file?

        3) What do the values of the counters mean?

        4) When should any action be taken on this data?

         

        http://vmadmin.info

        • Files Causing Misaligned IO's (nfsstat -d) and mbrscan don't agree
          Currently Being Moderated

          1) nfsstat -z, nfsstat -d over a 24hour period

          2) no idea, possibly a bug

          3)

                        Files Causing Misaligned I/O's

                                            List  of  filenames  that   are

                                            causing   the  most  misaligned

                                            I/O's over NFS along with their

                                            corresponding  heuristic  coun-

                                            ters. The  higher  the  counter

                                            value,   the  higher  the  mis-

                                            aligned I/O  requests  for  the

                                            corresponding file.

           

          4) are there any performance issues ? if not, then don't try to fix it,

        • Files Causing Misaligned IO's (nfsstat -d) and mbrscan don't agree
          larsson
          Currently Being Moderated

          To answer number 2 (Why are there multuple counters listed for the same file?); this is because there's a counter for each cpu.

More Like This

  • Retrieving data ...