34 Replies Latest reply: Aug 28, 2013 2:01 AM by werner.komar RSS

Active (I/O) paths misconfigured on VMware with ALUA

pedro.rocha Cyclist
Currently Being Moderated

Hello all,

 

I am seeing a strange behavior in as customer´s virtual environment. We have this vSphere 5.0 cluster which is accessing its Datastores via FCP. VSC 4.1 is installed and recommended settings are applied to the cluster. Igroups on the NetApp side have ALUA enabled.

 

On the vSphere side, path selection policy is round-robin and SATP is ALUA aware also (VMW_SATP_ALUA). There are 8 paths from each ESXi server to the NetApp HA pair (4 to each controller, running DOT 8.0.2P3 7-mode).

 

With this configuration we are seeing all paths as "Active (I/O)". Which for me indicates that all paths are active and I/O is passing through them all. I understand that with this configuration we would have 4 paths as "Active (I/O)" (optimized paths) and 4 paths as "Active" (non-optimized paths).

 

Does anyone know what could be possibly happening here? What's my mistake?

 

Kind regards,

 

Pedro Rocha.

    • Re: Active (I/O) paths misconfigured on VMware with ALUA
      pedro.rocha Cyclist
      Currently Being Moderated

      Hi Doug,

       

      Have done that before, no success. Case opened, let's see what is happening. Will post here...

       

      Regards,

      Pedro Rocha

      • Re: Active (I/O) paths misconfigured on VMware with ALUA
        Sergio_Santos Novice
        Currently Being Moderated

        Hi Pedro,

         

        I'm having the exact same problem and also have a ticket open but it's been a few weeks and so far no luck. Were you able to get a resolution? My setup is almost identical except my DOT is 8.0.2P4 and ESXi servers are 5.1 (though I had this problem with vSphere 5.0 as well). What kind of SAN switches are you using if I may ask? Thanks!

         

        Sergio

        • Re: Active (I/O) paths misconfigured on VMware with ALUA
          pedro.rocha Cyclist
          Currently Being Moderated

          Hi Sergio,

           

          I have not got any answer yet. Case is still opened. We are using Brocade 5100 switches for the SAN environment.

           

          I'll post here if I get any luck.

           

          Regards,

          Pedro Rocha.

          • Re: Active (I/O) paths misconfigured on VMware with ALUA
            werner.komar Novice
            Currently Being Moderated

            Hello Pedro,

             

            we have the same Configuratiuon / Problem. We use a FAS3240 Ontap 8.1.2 7 Mode with FC and Brocade 200E 4GB. The Servers come from HP CX7000 Blade

            On the vSphere side, path selection policy is round-robin and SATP is ALUA.

             

            All 4 Path are (Active IO). When we do a "CF Takeover", we lost the communication HOST <-> Datastore. If we "cf giveback" all 4 Path will come back online.

             

            We open a Case since 29.4.2013. ....we are waiting...

             

            Regards

            Werner Komar

            • Re: Active (I/O) paths misconfigured on VMware with ALUA
              pedro.rocha Cyclist
              Currently Being Moderated

              Werner,

               

              We are now dealing with VMware. NetApp told us to contact them since the issue does not appear to be related to NetApp.

               

              I'll tell you when we have something.

               

              Regards,

              Pedro

               

              Pedro Rocha

              +55 61 8203-5800

               

              Enviado pelo meu BlackBerry®

              • Re: Active (I/O) paths misconfigured on VMware with ALUA
                gatorman1369 Novice
                Currently Being Moderated

                Has this been sorted out, I am running into this issue with 5.1 hosts and ALUA enabled iGroups. I can give more details if needed but I am basically just reaching out before we submit a ticket on this.

                 

                Message was edited by: Allan Howard Here's a nmp list -d for one of my datastores ~ # esxcli storage nmp device list -d naa.60a98000572d43504f5a636d57537144 naa.60a98000572d43504f5a636d57537144    Device Display Name: NETAPP Fibre Channel Disk (naa.60a98000572d43504f5a636d57537144)    Storage Array Type: VMW_SATP_ALUA    Storage Array Type Device Config: {implicit_support=on;explicit_support=off; explicit_allow=on;alua_followover=on;{TPG_id=2,TPG_state=AO}{TPG_id=2,TPG_state=AO}{TPG_id=2,TPG_state=AO}}    Path Selection Policy: VMW_PSP_RR    Path Selection Policy Device Config: {policy=rr,iops=1000,bytes=10485760,useANO=0;lastPathIndex=3: NumIOsPending=0,numBytesPending=0}    Path Selection Policy Device Custom Config:    Working Paths: vmhba2:C0:T1:L61, vmhba2:C0:T0:L61, vmhba3:C0:T0:L61, vmhba3:C0:T1:L61    Is Local SAS Device: false    Is Boot USB Device: false

                • Re: Active (I/O) paths misconfigured on VMware with ALUA
                  pedro.rocha Cyclist
                  Currently Being Moderated

                  Hello,

                   

                  Nothing yet. I would recommend to check this with VMware also. From NetApp that is what I got (a direction to check with VMware).

                   

                  Regards,

                  Pedro.

                • Re: Active (I/O) paths misconfigured on VMware with ALUA
                  Sergio_Santos Novice
                  Currently Being Moderated

                  No luck on my end either. On my 5th NetApp tech and they referred me to VMware support to resolve the issue. I'm waiting to hear back from them, but they've been awfully quiet.

                   

                  I'm about to change the PSP to using FIXED with a preferred path as I've been working on this for months and I need to move on with this project. Maybe someone with more time can follow up on this and get this resolved once and for all since it doesn't sound like just me with this problem. I'm gonna dump my setup below and some of the steps I've taken:

                   

                  SETUP:

                  Brocade VA-40FC 8Gbp FC switches (independent dual-fabric)

                  DELL R910/R900 servers with two single port Qlogic QLE2460 HBAs (each HBA to one fabric)

                  NetApp V3240 controllers with DataONTAP 8.0.2P4, single_image cf mode (1st FC port to fabric 1, 2nd to fabric 2); no V features used (back-end storage array connections)

                  Single initiator, single target zoning (Each ESXi hba port to one NetApp FC port per controller)

                   

                  TROUBLESHOOTING (none of the steps below had any affect on pathing):

                  * Was happening with ESXi 5.0 with same DELL servers and NetApp controllers, and McData DS4700M FC SAN switches

                  * Built new ESXi 5.1 server on DELL R720

                  * Tried old firmware HBA QLE 2460 4.00.012 later upgraded to latest 5.09.0000

                  * Tried old firmware dual-port HBA QLE 2462 4.00.030 later upgraded to latest 5.09.0000

                  * Change FC fill-word option on Brocade switch ports from 1 to 3 for esxi ports and netapp ports

                  * Injected latest Qlogic drivers into ESXi OS (originally using v911.k1.1-26vmw but now 934.5.20.0-1vmw)

                  * Disable port features (NPIV, trunking, QoS) on Brocade for ESXi port

                  * Upgraded to ESXi 5.1 U1

                  • Re: Active (I/O) paths misconfigured on VMware with ALUA
                    gatorman1369 Novice
                    Currently Being Moderated

                    I've submitted a ticket to VMware.

                     

                    I hate to be the guy who throws a wrench into this, but I've got a similar site as the one that isn't working, my HQ actually, with HP DL 380 G7's, and ESXi builds that were upgraded from build 9xxxxx to 1065491. ONTAP is 8.1.2 7-mode on 3250's. ALUA works just fine here.

                     

                    Last week I went to my DR site to do a revamp, get more hosts and it's own vCenter, as my hosts there were managed by my vCenter at HQ. ALUA not working.

                     

                    Here are a few of my details on the site that isn't working.

                     

                    CISCO MDS 9124 (Using MDS 9148 at HQ)

                    NetApp 3140 HA pair   ONTAP 8.1.2 P1 7-mode (yep, difference is the P1 compared to HQ)

                    HP DL380 G5

                    ESXi 5.1u1 Build 1065491   (Fresh install, not upgraded like HQ)

                    Emulex LPe12002 HBA   Firmware: 2.00A4   Drivers:8.2.3.1-127vmw (Same as HQ)

                    I've configured iGroups for each of my hosts with ALUA enabled, yes I've rebooted them.

                     

                    I've even run the NetApp VSC "Set recommended Values" for of the timeouts and such . Also did not help. Mind you, that I haven't run this back at HQ, and it doesn't seem to be an issue.

                     

                    I've done similar TS steps as I am sure a lot of you have already done.. like blowing away the iGroups and creating them over again via System Manager GUI instead of using the command line originally. I've tried creating the iGroups without enabling ALUA first, saving the configuration, and then enabling ALUA both in SM and command line. Or creating a new LUN and seeing what the host decides the multipathing will be. I can not enable ALUA on the iGroup, and not have all paths active i/o... Going back to fixed path may be the thing I have to go with too.

                     

                    I'll hopefully speak to VMware tomorrow.

                    • Re: Active (I/O) paths misconfigured on VMware with ALUA
                      pedro.rocha Cyclist
                      Currently Being Moderated

                      Hi all,

                       

                      We are now speaking with VMware support, who directed us back to NetApp

                      support.

                       

                      I am going to try to make them speak to each other since this is really

                      annoying and several people is getting same results.

                       

                      Regards,

                      Pedro.

                       

                       

                      On Wed, May 29, 2013 at 4:58 PM, gatorman1369 <

                      • Re: Active (I/O) paths misconfigured on VMware with ALUA
                        werner.komar Novice
                        Currently Being Moderated

                        Hi Pedro,

                        this is our ALUA Story: Netapp 3240, FCP, ALUA Misconfigure
                        We have a case open over 6 Month and Netapp Support don’t know what’s going on. We always have 4 Active FCP Path in ESXi. (2 Active I/O and 2 Active should be normal). We check everything, the ESXi config, WWPNs Zoning, nothing help. After long time deal with Netapp Support, we do our own investigation and we found out that there is a Problem with the "local.single_image.key" and "partner.single_image.key". You see this in Lun config:
                        Code:
                        priv set diag
                        lun config
                        output: (the output has more information but this are the important)
                        local.single_image.key = "157459xxxxx"
                        partner.single_image.key = "15746xxxxx"
                        local.serialnum = "2000004xxxxx"
                        partner.serialnum = "5000002xxxxx"

                         

                        This numbers should be correct on both HA Pairs visa versa. In our case the "partner.single_image.key" was not the "local.singel_image.key" from the other HA controller.
                        We ask Netapp Support what’s going on with this numbers and the told us that "could be" a Problem. Then we ask for an Action Plan to change the "partner.single_image.key" to correct the problem.
                        We also ask, if we change the partner.single_image.key, did anything else change? And Netapp Support said no.

                        Code:
                        priv set diag
                        fcp stop
                        lun config set partner.single_image.key xxxxxxxxxx (we enter the HA Partners local.single_image.key)
                        fcp start
                        priv set
                        reboot (do automatic takeover)

                        and... after reboot ... WWPN change automatically on the controller where we change the partner.single_image Key.
                        We think about to change manually the WWPNs back to the original setting. We don’t know if it’s open another problem. We decide to change the zoning and in both Brocade Switches and: HERE WE GO
                        We reboot our 12 ESXi Servers and ALUA did the Job. After this, we bring 200 VM back online.
                        The Problem start with a FAS3240 Motherboard replacement.

                        Regards
                        Werner

  • Re: Active (I/O) paths misconfigured on VMware with ALUA
    PS-SUPPORT Novice
    Currently Being Moderated

    We had a similar issue with two FAS 3240 in a streched Metro Cluster using FC.

    The solution for our environment was to disable ALUA on IGroup within the Netapp.

    Then the ESX-Servers use default-ALU (VMW_SATP_DEFAULT_AA). Path Policy is set to Round robin (VMWARE).

     

    Maybe this helps.

     

    Regards,

     

    ps-support

    • Re: Active (I/O) paths misconfigured on VMware with ALUA
      PS-SUPPORT Novice
      Currently Being Moderated

      Additional info:

       

      you have to reboot the esx servers after disabling ALUA on the netapp, then the esx servers should use VMW_SATP_DEFAULT_AA  instead of VMW_SATP_ALUA.

       

       

      Regards,

       

      ps-support

      • Re: Active (I/O) paths misconfigured on VMware with ALUA
        pedro.rocha Cyclist
        Currently Being Moderated

        Hi,

         

        Disabling ALUA does not seem to be a solution. AFAIK it is recommended to use ALUA with roud robin policy for this environment. Or not?

         

        We have opened a case with VMware since this appears to be a VMware issue. I'll post here what we find out.

         

        Regards,

        Pedro Rocha.

        • Re: Active (I/O) paths misconfigured on VMware with ALUA
          sinergy_storage Novice
          Currently Being Moderated

          Hi everybody. Same issue to us. Customer with Metrocluster 3240 / two fabric (host ESXi 5.1 no update 1) with 2 HBA QLOGIC - 8 path all in Active I/O.

          In the same cluster vmware also we are zoned with IBM SVC. SVC uses all active I/O path enabled.

           

          We expect from Netapp the traditional ALUA, so Active I/O to half of the path and the Others only in "Active".

           

          is there any response from VMWARE/NetApp

           

          what about changing Advanced paramters

          Disk.UseDeviceReset from 1 to 0 or

          Disk.UseLunReset ????

          • Re: Active (I/O) paths misconfigured on VMware with ALUA
            sinergy_storage Novice
            Currently Being Moderated

            what about the information regarding optimizations path written on this recent vmware kb?

             

            http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1008113

             

            It seems something tied to the single device

            wainting further response

            • Re: Active (I/O) paths misconfigured on VMware with ALUA
              gatorman1369 Novice
              Currently Being Moderated

              I've run the "esxcli storage core device list" command in a ALUA working environment as well as my non-working environment. Results from both are the same.

               

              Particularly from this KB you mention....

                 Queue Full Sample Size: 0

                 Queue Full Threshold: 0

               

               

              Example:

               

              ALUA working

              naa.60a98000572d43504b5a66444d755758

                 Display Name: NETAPP Fibre Channel Disk (naa.60a98000572d43504b5a66444d755758)

                 Has Settable Display Name: true

                 Size: 512078

                 Device Type: Direct-Access

                 Multipath Plugin: NMP

                 Devfs Path: /vmfs/devices/disks/naa.60a98000572d43504b5a66444d755758

                 Vendor: NETAPP

                 Model: LUN

                 Revision: 811a

                 SCSI Level: 4

                 Is Pseudo: false

                 Status: on

                 Is RDM Capable: true

                 Is Local: false

                 Is Removable: false

                 Is SSD: false

                 Is Offline: false

                 Is Perennially Reserved: false

                 Queue Full Sample Size: 0

                 Queue Full Threshold: 0

                 Thin Provisioning Status: yes

                 Attached Filters: VAAI_FILTER

                 VAAI Status: supported

                 Other UIDs: vml.020026000060a98000572d43504b5a66444d7557584c554e202020

                 Is Local SAS Device: false

                 Is Boot USB Device: false

               

               

               

              ALUA not working

              naa.60a98000572d43504f5a6b337a643563

                 Display Name: NETAPP Fibre Channel Disk (naa.60a98000572d43504f5a6b337a643563)

                 Has Settable Display Name: true

                 Size: 266254

                 Device Type: Direct-Access

                 Multipath Plugin: NMP

                 Devfs Path: /vmfs/devices/disks/naa.60a98000572d43504f5a6b337a643563

                 Vendor: NETAPP

                 Model: LUN

                 Revision: 811a

                 SCSI Level: 4

                 Is Pseudo: false

                 Status: on

                 Is RDM Capable: true

                 Is Local: false

                 Is Removable: false

                 Is SSD: false

                 Is Offline: false

                 Is Perennially Reserved: false

                 Queue Full Sample Size: 0

                 Queue Full Threshold: 0

                 Thin Provisioning Status: yes

                 Attached Filters: VAAI_FILTER

                 VAAI Status: supported

                 Other UIDs: vml.020064000060a98000572d43504f5a6b337a6435634c554e202020

                 Is Local SAS Device: false

                 Is Boot USB Device: false

               

               

              Appreciate the time and\ the KB link.

              • Re: Active (I/O) paths misconfigured on VMware with ALUA
                gatorman1369 Novice
                Currently Being Moderated

                While not fixed, I've run this command on my not working ALUA filers... I think this is our issue here and I'll update as this move along.

                 

                MyFiler> lun config_check -v

                Checking for down fcp interfaces

                ======================================================

                The following FCP HBAs appear to be down

                           0c  LINK NOT CONNECTED

                           0a  LINK NOT CONNECTED

                 

                Checking initiators with mixed/incompatible settings

                ======================================================

                        No Problems Found

                 

                Checking igroup ALUA settings

                ======================================================

                (null)  No Problems Found

                 

                Checking for nodename conflicts

                ======================================================

                        No Problems Found

                 

                Checking for initiator group and lun map conflicts

                ======================================================

                        No Problems Found

                 

                Checking for igroup ALUA conflicts

                ======================================================

                        No Problems Found

                 

                Checking for duplicate WWPNs

                ======================================================

                The following WWPNs are duplicate:

                        500a0981892accc6

                        500a0982892accc6

                        500a0983892accc6

                        500a0984892accc6

  • Re: Active (I/O) paths misconfigured on VMware with ALUA
    sinergy_storage Novice
    Currently Being Moderated

    now we are wainting for a time Windows on the customer's production infrastructure to trying a selective zoning.

    In other words, the problem seems to be tied to the number o path to the devices that is major of 4.

    In another cluster with only 4 path to datastores the ALUA works perfectly.

    We'try to remove some path from fabric so that a max of 4 path will be available. Then see the results.

     

    If somebedy could test this situztion in time before us, please write a feedback.

     

     

    Bye

     

    Sinergy

    • Re: Active (I/O) paths misconfigured on VMware with ALUA
      Sergio_Santos Novice
      Currently Being Moderated

      Hi Sinergy,

       

      We've got 4 paths total here and it's not working (they all have Active (I/O)). The four paths breakdown like this:

       

      Fabric 1

      Path 1: ESX vmhba1 to V3240 Controller 1 Port 0d

      Path 2: ESX vmhba1 to V3240 Controller 2 Port 0d

       

      Fabric 2

      Path 3: ESX vmhba2 to V3240 Controller 1 Port 4a

      Path 4: ESX vmhba2 to V3240 Controller 2 Port 4a

       

      I would try to change your pathing scheme anyway because there does not seem to be a clear pattern when this happens and when it doesn't. No word from the VMware tech on my end. I don't know if I mentioned it, but all my ESX setups have been fresh--no upgrades from 4.0 to 5.0 to 5.1. I installed both a new vCenter sever and reinstalled ESXi each time.

       

      Also, I was trying to pick the brain of the Netapp tech to find out if there's a way to debug the ALUA negotiation between the ESX and the controller. Either a log or in real time. Alas, it doesn't sound like there is any way to do that.

  • Re: Active (I/O) paths misconfigured on VMware with ALUA
    pedro.rocha Cyclist
    Currently Being Moderated

    Hello all,

     

    I am reopening my case with NetApp. VMware is telling me that the TPG parameter is being erroneously passed to the vSphere cluster for some Datastores (and that this is sent from the storage to the vSphere). Here's an example of bad configuration:

     

    ~ # esxcli storage nmp device list | grep -A 5 naa.60a9800064656d735a346b436a54426b

    naa.60a9800064656d735a346b436a54426b

       Device Display Name: NETAPP Fibre Channel Disk (naa.60a9800064656d735a346b436a54426b)

       Storage Array Type: VMW_SATP_ALUA

       Storage Array Type Device Config: {implicit_support=on;explicit_support=off; explicit_allow=on;alua_followover=on;{TPG_id=2,TPG_state=AO}{TPG_id=2,TPG_state=AO}}

       Path Selection Policy: VMW_PSP_RR

       Path Selection Policy Device Config: {policy=rr,iops=1000,bytes=10485760,useANO=0;lastPathIndex=9: NumIOsPending=0,numBytesPending=0}

       Path Selection Policy Device Custom Config:

     

    Best wishes,

    Pedro Rocha.

    • Re: Active (I/O) paths misconfigured on VMware with ALUA
      gatorman1369 Novice
      Currently Being Moderated

      Pedro,

       

      This was exactly what I was seeing. TPG_id=x on all of my SATDC.

       

      Please run a "lun config_check -v" from your filers. When I did yesterday, I noticed duplicate WWPN's from a filer head swap that was preformed a few weeks ago. Also run a "fcp show adapter" on both filers and compare the FC NODENAMEs and FC PORTNAMEs. Guess what, mine where identical on both filers.

       

      https://kb.netapp.com/support/index?page=content&id=1013497&actp=search&viewlocale=en_US&searchid=null#Storage_controller_head_upgrade_best_practice_procedure_for_SAN

       

      Resolving duplicate fibre channel WWPNs between nodes in an HA-pair

      "WWPNs can be duplicated across ports in an HA-pair under certain circumstances. As a side effect, to duplicate WWPNs, the ALUA configuration will also be duplicated causing all MPIO path states to be either optimized or non-optimized. This issue typically occurs when existing HA-pair systems configured with FCP are split and rejoined with the new nodes. The following procedure must be followed to resolve the duplicate configuration and restore the ALUA configuration. This procedure will cause WWPNs on one of the two nodes to change. The change to WWPNs will require fibre channel zoning configuration updates after the new WWPNs have been generated."

       

      Hope this helps you.

       

       

      Output follows from my filers to show you an example.

       

      MyFiler> lun config_check -v

      Checking for down fcp interfaces

      ======================================================

      The following FCP HBAs appear to be down

                 0c  LINK NOT CONNECTED

                 0a  LINK NOT CONNECTED

       

      Checking initiators with mixed/incompatible settings

      ======================================================

              No Problems Found

       

      Checking igroup ALUA settings

      ======================================================

      (null)  No Problems Found

       

      Checking for nodename conflicts

      ======================================================

              No Problems Found

       

      Checking for initiator group and lun map conflicts

      ======================================================

              No Problems Found

       

      Checking for igroup ALUA conflicts

      ======================================================

              No Problems Found

       

      Checking for duplicate WWPNs

      ======================================================

      The following WWPNs are duplicate:

              500a0981892accc6

              500a0982892accc6

              500a0983892accc6

              500a0984892accc6

       

      MyFiler> Fri May 31 10:09:20 EDT [MyFiler:scsitarget.conflicting.wwpns:error]: Local node and partner node have conflicting WWPNs and ALUA states which will degrade host MPIO performance.

       

       

       

       

       

      My Filer1

      MyFiler1> fcp show adapter

      Slot:                    0c

      Description:             Fibre Channel Target Adapter 0c (QLogic 2432 (2462) rev. 2)

      Adapter Type:            Local

      Status:                  LINK NOT CONNECTED

      FC Nodename:             50:0a:09:80:89:2a:cc:c6 (500a0980892accc6)

      FC Portname:             50:0a:09:83:89:2a:cc:c6 (500a0983892accc6)

      Standby:                 No

       

      Slot:                    0d

      Description:             Fibre Channel Target Adapter 0d (QLogic 2432 (2462) rev. 2)

      Adapter Type:            Local

      Status:                  ONLINE

      FC Nodename:             50:0a:09:80:89:2a:cc:c6 (500a0980892accc6)

      FC Portname:             50:0a:09:81:89:2a:cc:c6 (500a0981892accc6)

      Standby:                 No

       

      Slot:                    0a

      Description:             Fibre Channel Target Adapter 0a (QLogic 2432 (2462) rev. 2)

      Adapter Type:            Local

      Status:                  LINK NOT CONNECTED

      FC Nodename:             50:0a:09:80:89:2a:cc:c6 (500a0980892accc6)

      FC Portname:             50:0a:09:84:89:2a:cc:c6 (500a0984892accc6)

      Standby:                 No

       

      Slot:                    0b

      Description:             Fibre Channel Target Adapter 0b (QLogic 2432 (2462) rev. 2)

      Adapter Type:            Local

      Status:                  ONLINE

      FC Nodename:             50:0a:09:80:89:2a:cc:c6 (500a0980892accc6)

      FC Portname:             50:0a:09:82:89:2a:cc:c6 (500a0982892accc6)

      Standby:                 No

       

       

      MyFiler2

      MyFiler2> fcp show adapter

      Slot:                    0c

      Description:             Fibre Channel Target Adapter 0c (QLogic 2432 (2462) rev. 2)

      Adapter Type:            Local

      Status:                  LINK NOT CONNECTED

      FC Nodename:             50:0a:09:80:89:2a:cc:c6 (500a0980892accc6)

      FC Portname:             50:0a:09:81:89:2a:cc:c6 (500a0981892accc6)

      Standby:                 No

       

      Slot:                    0d

      Description:             Fibre Channel Target Adapter 0d (QLogic 2432 (2462) rev. 2)

      Adapter Type:            Local

      Status:                  ONLINE

      FC Nodename:             50:0a:09:80:89:2a:cc:c6 (500a0980892accc6)

      FC Portname:             50:0a:09:82:89:2a:cc:c6 (500a0982892accc6)

      Standby:                 No

       

      Slot:                    0a

      Description:             Fibre Channel Target Adapter 0a (QLogic 2432 (2462) rev. 2)

      Adapter Type:            Local

      Status:                  LINK NOT CONNECTED

      FC Nodename:             50:0a:09:80:89:2a:cc:c6 (500a0980892accc6)

      FC Portname:             50:0a:09:83:89:2a:cc:c6 (500a0983892accc6)

      Standby:                 No

       

      Slot:                    0b

      Description:             Fibre Channel Target Adapter 0b (QLogic 2432 (2462) rev. 2)

      Adapter Type:            Local

      Status:                  ONLINE

      FC Nodename:             50:0a:09:80:89:2a:cc:c6 (500a0980892accc6)

      FC Portname:             50:0a:09:84:89:2a:cc:c6 (500a0984892accc6)

      Standby:                 No

      • Re: Active (I/O) paths misconfigured on VMware with ALUA
        pedro.rocha Cyclist
        Currently Being Moderated

        Hi!

         

        But correcting that solved your issue? Is everything fine now?

         

        Regards,

        Pedro Rocha.

         

         

        On Fri, May 31, 2013 at 11:20 AM, gatorman1369 <

        • Re: Active (I/O) paths misconfigured on VMware with ALUA
          gatorman1369 Novice
          Currently Being Moderated

          Have not preformed the operations just yet, I'd like all hosts that are connected to be down and need to schedule that out.

           

          I've identified a major issue that points in the direction of not just ALUA, but all multipathing not functioning correctly. Since I've determined this to be an issue I've gone back to RHEL and Exchange servers connected to these filers, they are also affected by this but it wasn't apparent at first glance. The description of duplicate WWPN's and what ALUA is doing from the netapp kb certainly fits my issue. Let us know if the lun config_check -v comes up with anything.

          • Re: Active (I/O) paths misconfigured on VMware with ALUA
            pedro.rocha Cyclist
            Currently Being Moderated

            Ok, I see.

             

            I will not be able to run the command this week, since this is located at a

            customer site.

             

            Regarding duplicate WWPNs, I was able to check that via MyAutosupport and

            it is not the case.

             

            Regards,

            Pedro.

             

             

            On Fri, May 31, 2013 at 11:41 AM, gatorman1369 <

          • Re: Active (I/O) paths misconfigured on VMware with ALUA
            Sergio_Santos Novice
            Currently Being Moderated

            Sweet lord this is the best solution I've heard in months! I don't have the exact setup but I might end up trying this anyway when I can schedule an outtage window. What differs in my situation is the four FC ports that I'm using do not have duplicate WWPNs thus the zoning "looks" correct. I also hypothesize that's why my "lun config_check -v" doesn't report the duplicate WWPNs message. But (and this is a BIG but), when I compare the complete "fcp show adapters" list on both filers there are links that are not connected with the exact same WWPNs between both. This may be enough to confuse the NetApps and send the incorrect ALUA info. That makes total sense to me. Here's my fcp show adapters dump (duplicates in bold & underlined):

             

            filer1> fcp show adapter

            Slot:                    4a

            Description:             Fibre Channel Target Adapter 4a (Dual-channel, QLogic 2432 (2464) rev. 3)

            Adapter Type:            Local

            Status:                  ONLINE

            FC Nodename:             50:0a:09:80:86:f7:b3:14 (500a098086f7b314)

            FC Portname:             50:0a:09:83:96:f7:b3:14 (500a098396f7b314)

            Standby:                 No

             

             

            Slot:                    4b

            Description:             Fibre Channel Target Adapter 4b (Dual-channel, QLogic 2432 (2464) rev. 3)

            Adapter Type:            Local

            Status:                  LINK NOT CONNECTED

            FC Nodename:             50:0a:09:80:86:f7:b3:14 (500a098086f7b314)

            FC Portname:             50:0a:09:84:96:f7:b3:14 (500a098496f7b314)

            Standby:                 No

             

             

            Slot:                    4c

            Description:             Fibre Channel Target Adapter 4c (Dual-channel, QLogic 2432 (2464) rev. 3)

            Adapter Type:            Local

            Status:                  LINK NOT CONNECTED

            FC Nodename:             50:0a:09:80:86:f7:b3:14 (500a098086f7b314)

            FC Portname:             50:0a:09:85:96:f7:b3:14 (500a098596f7b314)

            Standby:                 No

             

             

            Slot:                    4d

            Description:             Fibre Channel Target Adapter 4d (Dual-channel, QLogic 2432 (2464) rev. 3)

            Adapter Type:            Local

            Status:                  LINK NOT CONNECTED

            FC Nodename:             50:0a:09:80:86:f7:b3:14 (500a098086f7b314)

            FC Portname:             50:0a:09:86:96:f7:b3:14 (500a098696f7b314)

            Standby:                 No

             

             

            Slot:                    0d

            Description:             Fibre Channel Target Adapter 0d (Dual-channel, QLogic 2432 (2462) rev. 2)

            Adapter Type:            Local

            Status:                  ONLINE

            FC Nodename:             50:0a:09:80:86:f7:b3:14 (500a098086f7b314)

            FC Portname:             50:0a:09:82:96:f7:b3:14 (500a098296f7b314)

            Standby:                 No

             

             

            filer2> fcp show adapter

            Slot:                    4a

            Description:             Fibre Channel Target Adapter 4a (Dual-channel, QLogic 2432 (2464) rev. 3)

            Adapter Type:            Local

            Status:                  ONLINE

            FC Nodename:             50:0a:09:80:86:f7:b3:14 (500a098086f7b314)

            FC Portname:             50:0a:09:81:96:f7:b3:14 (500a098196f7b314)

            Standby:                 No

             

             

            Slot:                    4b

            Description:             Fibre Channel Target Adapter 4b (Dual-channel, QLogic 2432 (2464) rev. 3)

            Adapter Type:            Local

            Status:                  LINK NOT CONNECTED

            FC Nodename:             50:0a:09:80:86:f7:b3:14 (500a098086f7b314)

            FC Portname:             50:0a:09:82:96:f7:b3:14 (500a098296f7b314)

            Standby:                 No

             

             

            Slot:                    4c

            Description:             Fibre Channel Target Adapter 4c (Dual-channel, QLogic 2432 (2464) rev. 3)

            Adapter Type:            Local

            Status:                  LINK NOT CONNECTED

            FC Nodename:             50:0a:09:80:86:f7:b3:14 (500a098086f7b314)

            FC Portname:             50:0a:09:83:96:f7:b3:14 (500a098396f7b314)

            Standby:                 No

             

             

            Slot:                    4d

            Description:             Fibre Channel Target Adapter 4d (Dual-channel, QLogic 2432 (2464) rev. 3)

            Adapter Type:            Local

            Status:                  LINK NOT CONNECTED

            FC Nodename:             50:0a:09:80:86:f7:b3:14 (500a098086f7b314)

            FC Portname:             50:0a:09:84:96:f7:b3:14 (500a098496f7b314)

            Standby:                 No

             

             

            Slot:                    0d

            Description:             Fibre Channel Target Adapter 0d (Dual-channel, QLogic 2432 (2462) rev. 2)

            Adapter Type:            Local

            Status:                  ONLINE

            FC Nodename:             50:0a:09:80:86:f7:b3:14 (500a098086f7b314)

            FC Portname:             50:0a:09:85:96:f7:b3:14 (500a098596f7b314)

            Standby:                 No

            • Re: Active (I/O) paths misconfigured on VMware with ALUA
              Sergio_Santos Novice
              Currently Being Moderated

              Hang tight fellas, I'm working on an outtage window to do the duplicate WWPN fix. I might be able to test it later this week.

              • Re: Active (I/O) paths misconfigured on VMware with ALUA
                gatorman1369 Novice
                Currently Being Moderated

                Haven't forgetting about this or you guys  

                 

                We are going to preform the maintenance today or tomorrow.

                 

                 

                From our technical support engineer last correspondence.

                 

                //Start

                 

                I have finally found the root cause of why this happened, basically it is being caused by any of these 2 bugs

                 

                268320 - WWPN's are identical on both heads of a single_image cluster

                http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=268320

                 

                538094 - Check for duplicate WWPNs during bootup

                http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=538094

                 

                The steps to fix this issue are in the kb article you sent me before

                https://kb.netapp.com/support/index?page=content&id=1013497

                 

                Below you will find the steps to change the WWPN, page 147

                https://library.netapp.com/ecm/ecm_get_file/ECMM1277787

                 

                Steps

                1.    Take the adapter offline by entering the following command: fcp config adapter down

                Example

                fcp config 4a down

                2.    Display the existing WWPNs by entering the following command: fcp portname show [-v]

                If you do not use the -v option, all currently used WWPNs and their associated adapters are displayed. If you use the -v option, all other valid WWPNs that are not being used are also shown.

                3.    Set the new WWPN for a single adapter or swap WWPNs between two adapters. If you are swapping between two adapters, make sure you take both adapters offline first.

                Use this command . . .     To do this . . .

                fcp portname set adapter wwpn    Set the new WWPN on a single adapter.

                fcp portname swap adapter1 adapter2    Swap WWPNs between two adapters.

                4.    Example

                5.    fcp portname set 4a 10:00:00:00:c9:30:80:2

                6.    Example

                7.    fcp portname swap 3a 4a

                8.    Bring the adapter back online by entering the following command: fcp config adapter up

                Example

                fcp config 4a up

                 

                 

                This is not a disruptive process, even though you have to bring the fcp port down there is a path misconfigured that is causing both controllers to use the same wwpn, so we have a duplicate path.

                 

                If you go ahead and make the changes in the partner controller, there won’t be any service affectation.

                 

                // End

                 

                 

                It may be non-disruptive but I've been down that road a few times...Luckily for me, this is at a DR site where I can get away with powering off all of my workloads after hours. YMMV

                • Re: Active (I/O) paths misconfigured on VMware with ALUA
                  gatorman1369 Novice
                  Currently Being Moderated

                  Issue resolved

                   

                  On one of our filers we did the following...

                  Issued a "fcp portname show [-v]" to get a list of all available WWPN's. Remember that ours had the same ones for both filers, so we need to select new ones from a list.

                  Then, we issued the "fcp portname set adapter wwpn" to change offending duplicate WWPN's

                   

                  We did cf takeovers thinking that was all that their was to it. But then we noticed when issuing a "lun config" from both filers that the partner.single_image.key values were not set to be each others partner.

                   

                  From the KB: 1013497

                  Resolving duplicate fibre channel WWPNs between nodes in an HA-pair

                  https://kb.netapp.com/support/index?page=content&id=1013497&actp=search&viewlocale=en_US&searchid=null#Storage_controller_head_upgrade_best_practice_procedure_for_SAN

                   

                  WWPNs can be duplicated across ports in an HA-pair under certain circumstances. As a side effect, to duplicate WWPNs, the ALUA configuration will also be duplicated causing all MPIO path states to be either optimized or non-optimized. This issue typically occurs when existing HA-pair systems configured with FCP are split and rejoined with the new nodes. The following procedure must be followed to resolve the duplicate configuration and restore the ALUA configuration. This procedure will cause WWPNs on one of the two nodes to change. The change to WWPNs will require fibre channel zoning configuration updates after the new WWPNs have been generated.

                   

                          Stop the FCP service on both the nodes.

                          Run the following commands:

                          > priv set diag

                          *> lun config set local.single_image.key “”

                          *> lun config set partner.single_image.key “”

                          Repeat the process on the other node.

                          Perform a takeover and giveback of each node, or reboot each node.

                   

                   

                  After we did this we also had to re-zone the fabric because the filer got new WWN's.

                   

                  Thinking back on this, I am not sure we even needed to do the "fcp portname set adapter wwpn" commands on all four FCP adapters because when we did the "lun config set..." commands the WWPN's changed again.

                   

                  Also, there is no way I would have done this during the day as we definitely had all paths down a few times on ESXi hosts as well as some RHEL hosts. Luckily we planned for a few admins to help out with shutdowns and were able to knock this out successfully.

                   

                  Happy this is resolved for us - I may share some of our before and after configs when I get back in the office in the AM.

                  • Re: Active (I/O) paths misconfigured on VMware with ALUA
                    gatorman1369 Novice
                    Currently Being Moderated

                    Here is a before config where you can see that the partner.single_image.key values I mentioned above were not correctly set to each others local.single_image.key. So not only were our WWPNs the same across filers but we also had to make sure these keys were correctly configured. ALUA will not work until this partner configuration is correct. I also suspect this may very well be some of your issues who aren't seeing the duplicate WWPN messages when you run the "lun config_check -v" like pedro.rocha. His WWPns could be OK but the partner configurations may not be... I'd also take a look at your ispfct.single_image.nodename and related values.

                     

                    MyFiler1> priv set diag

                    Warning: These diagnostic commands are for use by NetApp

                             personnel only.

                    MyFiler1*> lun config

                    local.single_image.key = "151702736"

                    partner.single_image.key = "151702726"

                    iscsi.nodename = "iqn.1992-08.com.netapp:sn.151702736"

                    local.serialnum = "70000802"

                    fc-port-0b = "9"

                    fc-port-0d = "9"

                    ispfct.local.nodename = "d0ccfa8980090a50"

                    fcp.fabric = "dual"

                    ispfct.portname.0d = "0"

                    ispfct.portname.0b = "1"

                    copy_offload.state = "true"

                    vaw.state = "true"

                    write_same.state = "true"

                    ispfct.portname.storage = "0d:0,0b:1,0c:2,0a:3,"

                    iscsi.disabled_interface = "e0M:vif1-340:"

                    ispfct.nodename = "c6cc2a8980090a50"

                    ispfct.single_image.nodename = "c6cc2a8980090a50"

                    partner.serialnum = "70000801"

                    filer.serialnum = "70000802"

                    ispfct.mode = "single_image"

                    ispfct.partner.nodename = "c6cc2a8980090a50"

                     

                    MyFiler2> priv set diag

                    Warning: These diagnostic commands are for use by NetApp

                             personnel only.

                    MyFiler2*> lun config

                    local.single_image.key = "151702726"

                    partner.single_image.key = "151702484"

                    copy_offload.state = "true"

                    vaw.state = "true"

                    write_same.state = "true"

                    iscsi.nodename = "iqn.1992-08.com.netapp:sn.151702726"

                    local.serialnum = "70000801"

                    local.single_image.key = "151702726"

                    partner.single_image.key = "151702484"

                    ispfct.local.nodename = "c6cc2a8980090a50"

                    fc-port-0a = "9"

                    fc-port-0b = "9"

                    fc-port-0c = "9"

                    fc-port-0d = "9"

                    ispfct.nodename = "c6cc2a8980090a50"

                    ispfct.single_image.nodename = "c6cc2a8980090a50"

                    partner.serialnum = "70000802"

                    filer.serialnum = "70000801"

                    fcp.service = "on"

                    iscsi.disabled_interface = "e0M:vif1-340:"

                    ispfct.portname.storage = "0c:4,0d:5,0a:6,0b:7,"

                    ispfct.config.0a = "up"

                    ispfct.config.0b = "up"

                    ispfct.config.0c = "up"

                    ispfct.config.0d = "up"

                    ispfct.mode = "single_image"

                    ispfct.partner.nodename = "c6cc2a8980090a50"

                    iscsi.service = "on"

                    • Re: Active (I/O) paths misconfigured on VMware with ALUA
                      Sergio_Santos Novice
                      Currently Being Moderated

                      Sorry for the delay guys (it was more difficult to get an outtage window than I thought). I tried the steps in Gatorman's link--this one under: "Resolving duplicate fibre channel WWPNs between nodes in an HA-pair" https://kb.netapp.com/support/index?page=content&id=1013497

                       

                      And I'm glad to report everything is now working as expected! I did a full reboot of the nodes--not a takeover/giveback--and ran the "priv set diag" and then "lun config" to verify the local.single_image.key and partner.single_image.key were updated properly. My test ESX server is now back on the ALUA module and only 2 of the 4 paths have Active (I/O). Stopping the FCP service properly kills the paths on one node and fails over to the partner.

                       

                      I'm not sure how duplicate WWPNs were missed by multiple NetApp techs, but I can't say I'm not a little disappointed. The Configuration Checker software also doesn't report this, nor is it caught in the NetApp autosupport cloud. Unless you find that specific KB article, you're in the Twilight Zone!

                       

                      EDIT -- I forgot to mention the fix will only change the WWPNs on one node (the other's will remain the same). For the changed node's WWPNs you will need to go back and re-zone (or change aliases in my case). If you're quick to look over the 'fcp show adapters' output you might miss the changed WWPNs and think the fix didn't work.

  • Re: Active (I/O) paths misconfigured on VMware with ALUA
    ritchi641 Novice
    Currently Being Moderated

    While you are talking about ALUA and initiator group (igroup). We have lun only on one controler, the other one is CIFS only.

    When we go from non-alua to alua, did we have to activate ALUA on each igroup on both controler in case of a failover.

More Like This

  • Retrieving data ...

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points