15 Replies Latest reply: May 23, 2013 5:17 AM by mike_burris RSS

HA Configuration: Interconnect Status is DOWN on both Filers

ASRARGUNA
Currently Being Moderated

Hello All,

 

I am in some trouble here and would appreciate some help in resolving the issue.

 

I have 2 Filers: Filer1 and Filer2 as HA pair. Both are FAS3160 model with 8.1.1 7-mode running. When I login to On-Command system manager and go to "HA Configuration" It shows Interconnect status under both filers as Down (red X). Under Active/ Active State, it shows Failover under Filer1 and Takeover under Filer2. Please check the screenshot attached.

 

How can I get the HA config back on my filers?

 

Thanks - AG

  • Re: HA Configuration: Interconnect Status is DOWN on both Filers
    aleksandar.stefanov
    Currently Being Moderated

    Are the two physical filers up and running? Please check by logging to the console.

    • Re: HA Configuration: Interconnect Status is DOWN on both Filers
      ASRARGUNA
      Currently Being Moderated

      Thank You ALeksandar for your reply.

       

      Yes, both Filers are up and I can ping both filers from my pc. I can also connect to both filer through on command system manager and work on volumes, etc. Only HA is showing as in the attachment earlier.

       

      I also did SSH to Filer-2 and this is what I get there.

       

      Filer-2(takeover)> Thu May 16 10:00:00 AST [Filer-2:kern.uptime.filer:info]:  10:00am up 3 days,  1:36 5 NFS ops, 219901  946 CIFS ops, 0 HTTP ops, 0 FCP ops, 0 iSCSI ops

      Thu May 16 10:00:00 AST [Filer-2:monitor.shelf.fault:CRITICAL]: Fault reported on disk storage shelf attached to channel 0a . Please check fans, power supplies, disks, and temperature sensors.

      Thu May 16 10:00:10 AST [Filer-2:cf.ic.hourlyRVnag:error]: Cluster Interconnect sessions with partner have been DOWN for 44                             14 minute(s)

      Thu May 16 10:00:10 AST [Filer-2:cf.ic.hourlyNicDownTime:info]: Interconnect adapter link #0 has been down for 4414 minutes                            

      Thu May 16 10:00:10 AST [Filer-2:cf.ic.hourlyNicDownTime:info]: Interconnect adapter link #1 has been down for 4414 minutes

       

      Thanks - AG

      • Re: HA Configuration: Interconnect Status is DOWN on both Filers
        aleksandar.stefanov
        Currently Being Moderated

        Yes, but Filer-2 is in takeover mode. How are the two filers connected between them? Check the cables.

        • Re: HA Configuration: Interconnect Status is DOWN on both Filers
          ASRARGUNA
          Currently Being Moderated

          Unfortunately the filers are at a remote location so I can't check cables. Is there a way to check things remotely? Sorry.

          • Re: HA Configuration: Interconnect Status is DOWN on both Filers
            aleksandar.stefanov
            Currently Being Moderated

            If the cables are disconnected you will just see ports offline without knowing what is going on.

            execute ifconfig -a on both filers to see if interconnect ports on both filers are down.

            • Re: HA Configuration: Interconnect Status is DOWN on both Filers
              ASRARGUNA
              Currently Being Moderated

              I am not able to SSH to Filer-1 as it says "Shell not supported on takeover partner"

               

              On filer-2, this is the result of ifconfig -a

               

              Filer-2(takeover)> ifconfig -a

              e0a: flags=0x2fec867<UP,BROADCAST,RUNNING,MULTICAST,MULTIHOST,PARTNER_UP,TCPCKSUM> mtu 1500

                      inet 10.166.x.x netmask 0xffffff00 broadcast 10.166.x.x

              partner inet 10.166.x.x (e0a)

              ether 00:a0:x:x:x:x (auto-100tx-fd-up) flowcontrol full

              e0b: flags=0x270c866<BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500

              ether 00:a0:x:x:x:x (auto-unknown-down) flowcontrol full

              e3a: flags=0x170e866<BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500

              ether 00:a0:x:x:x:x (auto-unknown-down) flowcontrol full

              e3b: flags=0x170e866<BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500

              ether 00:a0:x:x:x:x (auto-unknown-down) flowcontrol full

              e4a: flags=0x2fec867<UP,BROADCAST,RUNNING,MULTICAST,MULTIHOST,PARTNER_UP,TCPCKSUM> mtu 1500

                      inet 10.166.x.x netmask 0xffffff00 broadcast 10.166.x.x

              partner inet 10.166.17.11 (e4a)

              ether 00:a0:x:x:x:x (auto-100tx-fd-up) flowcontrol full

              e4b: flags=0x270c866<BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500

              ether 00:a0:x:x:x:x (auto-unknown-down) flowcontrol full

              e4c: flags=0x270c866<BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500

              ether 00:a0:x:x:x:x (auto-unknown-down) flowcontrol full

              e4d: flags=0x270c866<BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500

              ether 00:a0:x:x:x:x (auto-unknown-down) flowcontrol full

              e0M: flags=0x2bec867<UP,BROADCAST,RUNNING,MULTICAST,MULTIHOST,PARTNER_UP,TCPCKSUM,MGMT_PORT> mtu 1500

                      inet 10.166.x.x netmask 0xffffff00 broadcast 10.166.x.x noddns

              partner inet 10.166.x.x (e0M)

              ether 00:a0:x:x:x:x (auto-100tx-fd-up) flowcontrol full

              lo: flags=0x1be8049<UP,LOOPBACK,RUNNING,MULTICAST,MULTIHOST,PARTNER_UP,TCPCKSUM> mtu 8160

                      inet 127.0.0.1 netmask 0xff000000 broadcast 127.0.0.1

              ether 00:00:00:00:00:00 (VIA Provider)

              losk: flags=0x40a400c9<UP,LOOPBACK,RUNNING> mtu 9188

                      inet 127.0.20.1 netmask 0xff000000 broadcast 127.0.20.1

              • Re: HA Configuration: Interconnect Status is DOWN on both Filers
                aleksandar.stefanov
                Currently Being Moderated

                OK, login to Filer-2 and try to perform giveback with this command: cf giveback

                • Re: HA Configuration: Interconnect Status is DOWN on both Filers
                  ASRARGUNA
                  Currently Being Moderated

                  Even if the interconnect state under both filers shows down?

                  • Re: HA Configuration: Interconnect Status is DOWN on both Filers
                    aleksandar.stefanov
                    Currently Being Moderated

                    You can always try. Can you connect to the RLM on Filer-1 and check what is the status?.

                    • Re: HA Configuration: Interconnect Status is DOWN on both Filers
                      ASRARGUNA
                      Currently Being Moderated

                      I tried cf giveback on Filer-2 and it says:

                       

                      Filer-2(takeover)> cf giveback

                      Partner not waiting for giveback, giveback cancelled.

                      To do a giveback without checking for partner readiness, please either set option "cf.giveback.check.partner" to "off" before doing "cf giveback" again, or do "cf giveback -f".

                      The first choice disables checking for all future "cf giveback", until it's turned back to "on". The second choice is good for this giveback only.

                       

                       

                      Yes I am able to connect to RLM on Filer-1 and it says: The system has booted in maintenance mode.

                       

                      May 13 05:24:49 [localhost:mgr.boot.reason_ok:notice]: System rebooted after a power down due to environmental condition.

                      May 13 05:24:49 [localhost:callhome.reboot.unknown:info]: Call home for REBOOT

                       

                      Ipspace "acp-ipspace"May 13 05:24:51 [localhost:acp.configWarn:debug]: Could not configure ACP administrator due to invalid Ethernet port in maintenance mode.

                      created

                      May 13 05:24:53 [localhost:cf.fmns.skipped.disk:notice]: While releasing the reservations in "Waiting For Giveback" state Failover Monitor Node State(fmns) module skipped the disk 0a.54 that is owned by 151742736 and reserved by 151742631.

                      May 13 05:24:59 [localhost:cf.fmns.skipped.disk:notice]: While releasing the reservations in "Waiting For Giveback" state Failover Monitor Node State(fmns) module skipped the disk 0a.54 that is owned by 151742736 and reserved by 151742631.

                      May 13 05:25:05 [localhost:cf.fmns.skipped.disk:notice]: While releasing the reservations in "Waiting For Giveback" state Failover Monitor Node State(fmns) module skipped the disk 0a.54 that is owned by 151742736 and reserved by 151742631.

                      May 13 05:25:11 [localhost:cf.fmns.skipped.disk:notice]: While releasing the reservations in "Waiting For Giveback" state Failover Monitor Node State(fmns) module skipped the disk 0a.54 that is owned by 151742736 and reserved by 151742631.

                      May 13 05:25:17 [localhost:cf.fmns.skipped.disk:notice]: While releasing the reservations in "Waiting For Giveback" state Failover Monitor Node State(fmns) module skipped the disk 0a.54 that is owned by 151742736 and reserved by 151742631.

                      May 13 05:25:23 [localhost:cf.fmns.skipped.disk:notice]: While releasing the reservations in "Waiting For Giveback" state Failover Monitor Node State(fmns) module skipped the disk 0a.54 that is owned by 151742736 and reserved by 151742631.

                      halt

                      May 13 05:25:25 [localhost:kern.cli.cmd:debug]: Command line input: the command is 'halt'. The full command line is 'halt'.

                      May 13 05:25:29 [localhost:cf.fmns.skipped.disk:notice]: While releasing the reservations in "Waiting For Giveback" state Failover Monitor Node State(fmns) module skipped the disk 0a.54 that is owned by 151742736 and reserved by 151742631.

                       

                       

                      Do you want me to run any specific command on the RLM of Filer-1 or Should I do cf giveback -f?

                      • Re: HA Configuration: Interconnect Status is DOWN on both Filers
                        mike_burris
                        Currently Being Moderated

                        If there's an environmental issue which caused this, you need to figure out what that is //before// performing a giveback and get it resolved.  From the RLM, check "events" and see if it gives you any details as to what happened.  Once that is resolved, whatever it may be, I'd also considering doing a diagnostic boot and run through the tests/checks just to make sure everything is fine.

                         

                        Something else that comes to mind; are there no open NetApp cases for this system?  I *think* the box would have asup'd when it had an environmental problem and then rebooted itself but dont hold me to that...

                        • Re: HA Configuration: Interconnect Status is DOWN on both Filers
                          ASRARGUNA
                          Currently Being Moderated

                          Hi Mike,

                           

                          Yes, there was power outage in the DC which caused all the servers to shutdown. So how do I do diagnostic boot?

                           

                          Thanks - AG

                          • Re: HA Configuration: Interconnect Status is DOWN on both Filers
                            SIRTECHIE42
                            Currently Being Moderated

                            Hi Asrar,

                             

                            Looking into this myself today, here is a link to the Diagnostics guide for 31xx systems.  https://library.netapp.com/ecm/ecm_download_file/ECMP1112531

                             

                            This guide should give you the details you are looking for to run the proper diagnostic tests Aleksander has mentioned.  To enter the diag mode enter the following at the LOADER prompt.

                             

                            LOADER> boot_diags

                            • Re: HA Configuration: Interconnect Status is DOWN on both Filers
                              ASRARGUNA
                              Currently Being Moderated

                              Hi,

                               

                              I connect through RLM and this is what i see in the messages: "Fault reported on disk storage shelf attached to channel 0a. Please check fans, power, and temperature.

                               

                              Filer-2(Takeover)> environment status shelf 0a

                              Channel: 0a

                                      Shelf: 2

                                      SES device path: local access: 0a.32

                                      Module type: ESH4; monitoring is active

                                      Shelf status: non-critical condition

                                      SES Configuration, via loop id 32 in shelf 2:

                                       logical identifier=0x50050cc002112326

                                       vendor identification=XYRATEX

                                       product identification=DS14-Mk2-FC

                                       product revision level=1414

                                      Vendor-specific information:

                                       Product Serial Number: OPS445022112326

                                       Optional Settings: 0x00

                                      Status reads attempted: 19272; failed: 0

                                      Control writes attempted: 320; failed: 0

                                      Shelf bays with disk devices installed:

                                        13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0

                                        with error: none

                                      Power Supply installed element list: 1, 2; with error: none

                                      Power Supply information by element:

                                        [1] Serial number: PMA441920116102  Part number: <N/A>

                                            Type: 34

                                            Firmware version: <N/A>  Swaps: 0

                                        [2] Serial number: PMA441920116091  Part number: <N/A>

                                            Type: 34

                                            Firmware version: <N/A>  Swaps: 0

                                      Power control element status: power control status not supported (DS14Mk4 shelf required)

                                      Cooling Unit installed element list: 1, 2; with error: 2

                                      Temperature Sensor installed element list: 1, 2, 3; with error: none

                                      Shelf temperatures by element:

                                        [1] 26 C (78 F) (ambient)  Normal temperature range

                                        [2] 36 C (96 F)  Normal temperature range

                                        [3] 35 C (95 F)  Normal temperature range

                                      Temperature thresholds by element:

                                        [1] High critical: 50 C (122 F); high warning: 40 C (104 F)

                                            Low critical:  0 C (32 F); low warning: 10 C (50 F)

                                        [2] High critical: 63 C (145 F); high warning: 53 C (127 F)

                                            Low critical:  0 C (32 F); low warning: 10 C (50 F)

                                        [3] High critical: 63 C (145 F); high warning: 53 C (127 F)

                                            Low critical:  0 C (32 F); low warning: 10 C (50 F)

                                      ES Electronics installed element list: 1, 2; with error: none

                                      ES Electronics reporting element: 1

                                      ES Electronics information by element:

                                        [1] Serial number: IMS6981331F5949  Part number: <N/A>

                                            CPLD version: <N/A>  Swaps: 0

                                        [2] Serial number: IMS6981331F597B  Part number: <N/A>

                                            CPLD version: <N/A>  Swaps: 0

                                      Embedded Switching Hub installed element list: 1, 2; with error: none

More Like This

  • Retrieving data ...