15 Replies Latest reply: Jun 27, 2013 6:17 PM by scoatney RSS

Is fabric metro cluster require all 4 mailbox disks to load ONTAP?

AEKAKIRA2 Novice
Currently Being Moderated

Is fabric metro cluster require all 4 mailbox disks to load ONTAP?

Will only local aggr mailbox disks (2) sufficient for a single node?

Is ISL link mandatory for a fabric metro cluster?

In case of force takeover mode, shutdown, startup of surviving node possible without any issues?

  • Re: Is fabric metro cluster require all 4 mailbox disks to load ONTAP?
    thomas.glodde Kart Racer
    Currently Being Moderated

    Is fabric metro cluster require all 4 mailbox disks to load ONTAP?

    no

    Will only local aggr mailbox disks (2) sufficient for a single node?

    yes

    Is ISL link mandatory for a fabric metro cluster?

    no

    In case of force takeover mode, shutdown, startup of surviving node possible without any issues?

    yes

     

    btw, you can always kill/reset mailbox with mailbox destroy local/partner in maintenance mode.

  • Re: Is fabric metro cluster require all 4 mailbox disks to load ONTAP?
    ismopuuronen Certified Sprinter
    Currently Being Moderated

    Hello,

     

    hope this help:

     

    mailbox disks are used to determine partner status. If the mailbox status is uncertain, cf will be disabled.

    Mailbox doesent affect how ontap is loaded during boot.

     

    Lets put it this way.
    In a HA configuration nodes has to know partner status.

    If the interconnect link is down, filers can still see that the partner is alive because nodes can update the mailbox disks.

     

    ISL link is used for data traffic and also to sync NVRAM (interconnect).

    If you have TI zone, then you have dedicated fiber for interconnect traffic.
    You cannot have fabric metrocluster without ISL link.

     

    Force takeover is different than basic takeover in a HA.

    Fore takeover is only available in metroclusters.

    "normal" takeover happens when nodeA goes down, nodeB sees that, and takes the role of nodeA to side of B to serve data.

    Force takeover doesent happen automatically, you have to type the command to do it.

     

    Scenario:

    ISL link breaks between the sites.

    Both nodes are okay, but the mirroring for data and cf can't happen any more, because nvmem is not in sync, pool 1 (mirrrored data) is unvailable, mailbox are unavailable.

    In this case, if you do a force takeover in nodeA, it will start serving nodeB data from nodeA site (mirrored data, so this is possible).

    This is not what you want, because nodeB is okay, and serving data all the time.

    Then you have "two nodeBs" available for the clients.

    You do force takeover only when you know, that other site is down, or if you know its going to be down.

    Example, air condition is broken and the heat is getting higher. you shut down the site B to avoid overheating, do the force takeover and start serve data from the site A.

     

    Br.

    Ismo.

    • Re: Is fabric metro cluster require all 4 mailbox disks to load ONTAP?
      AEKAKIRA2 Novice
      Currently Being Moderated

      Hello Ismo & Thomas
      I am looking at peculiar scenario where I am forcing entire fabric metro cluster to shutdown for both sites maintenance on same day/time. In entire scenario if DWDM link was not active and ISL was not available to nodes (both fabrics).

       

      1. I can execute cf disable and bring down both nodes separately

      2. In case of ISL (DWDM) link is not active,  can I bring up a single node (cf disable state)?

      3. If manually disable ISL ports on switches and proceed to execute CFOD (cf forcetakover -d), shutdown of first site, next shutdown of surviving node

           upon maintenance if I try to bring the 2nd node (in absence of active ISL link) will node come up? if it,  what are the steps (like release MB disks of local/partner or any more steps to perform to bring back the node)

      4. I see KB article  on mail box disks (if node not accessible need to reset mb disks) but not sure whether node comes up with only 2 local mb disks.

      Thanks in advance

      Kiran

      • Re: Is fabric metro cluster require all 4 mailbox disks to load ONTAP?
        thomas.glodde Kart Racer
        Currently Being Moderated

        1) yes

        2) in case isl goes down, cf gets disabled automaticaly

        3) the Moment you disable isl, cf forcetakeover -d on one node and have the 2nd node still running, you have a split brain. dont do so

        if you Need to do maintenance on node 2, do a normal cf takeover first, then disable isl and shut down node 2

        4) it will

        • Re: Is fabric metro cluster require all 4 mailbox disks to load ONTAP?
          AEKAKIRA2 Novice
          Currently Being Moderated

          Hello Thomas,

           

          I want to do maintenance of both sites (metro cluster) on same day and time.

          Hence I have to shutdown both nodes, once maintenance is complete I have to bring up both nodes. In case of ISL link is failed completely how to proceed?

          In above node 1 takeover node2 and node 1 (takeover)/ surviving site, I want to shutdown the surviving node also. When I start node1 (which was in takeover mode before shutdown) will it comes up normally?

          If any issue it encounter how to tackle the situation to bring up at least one node (either in cluster or cluster disabled state)

           

          I will be eagerly waiting for results of testing in next week.

           

          Regards

          Kiran

          • Re: Is fabric metro cluster require all 4 mailbox disks to load ONTAP?
            thomas.glodde Kart Racer
            Currently Being Moderated

            kiran,

            you want to have maintenance on both nodes at the same time?

             

            1) isl UP, on both nodes:

            cf disable

            halt -t 0

             

            2) isl DOWN, on both nodes:

            cf disable

            halt -t 0

             

            thats it, no need to takeover if you dont run any services on any of the sides anyway.

             

            after maintenance is done

            1) isl UP

            just boot both nodes

            cf enable

             

            2) isl DOWN

            just boot both nodes

            cf enable will fail as the isl is down

            • Re: Is fabric metro cluster require all 4 mailbox disks to load ONTAP?
              AEKAKIRA2 Novice
              Currently Being Moderated

              Hello Thomas

              Like you said above the process is working and followed, recently I faced a scenario where ISL (both fabrics) links were down during maintenance and when to try to boot up node was not coming up, finally released disks, mailbox disks destroyed on local, partner.

               

              Now looking for alternate process if any in case of such maintenance and un known issue encounter during the period, and trying to see whether we can perform a failover of node to partner and shutdown of both nodes (site A, B) for maintenance. After maintenance when we boot node on site B (takeover partner) will it require a ISL to come up properly ? 2nd thing is takeover node require all 4 mb disks to boot up (like earlier said not required)

               

              If there were issue like takeover node not able to boot, do we need to follow the same process of release of storage disks, destroying mailbox disks (local, partner) will bring up node?  seems from the below output takeover node lose cluster details, Is this true? in this case how to recover cluster status as is? and bring up node normally?

              Even if cluster state lost and able to bring up takeover node, is data upto that last write intact? if there were no changes/writes in last 30mins.

               

              Output:

              ++++++++++++

              *> mailbox destroy local

              Destroying mailboxes forces a node to create new empty mailboxes,

              which clears any takeover state and removes all knowledge

              of out-of-date plexes of mirrored volumes.

              Are you sure you want to destroy the local mailboxes? yes

              mailboxes destroyed


              *> mailbox destroy partner

              Destroying mailboxes forces a node to create new empty mailboxes,

              which clears any takeover state and removes all knowledge

              of out-of-date plexes of mirrored volumes.

              Destroying partner mailboxes means that you will not be

              able to do a takeover of any sort (including forcetakeover -d)

              until the partner reboots successfully. This is dangerous when

              the local node is, or should be in, takeover state and

              VERY dangerous if the partner has suffered some form of disaster

              Are you sure you want to destroy the partner mailboxes?

              ++++++++++++

               

              Thanks in advance

              Regards

              Kiran

              • Re: Is fabric metro cluster require all 4 mailbox disks to load ONTAP?
                aborzenkov Grand Marshal
                Currently Being Moderated

                Like you said above the process is working and followed, recently I faced a scenario where ISL (both fabrics) links were down during maintenance and when to try to boot up node was not coming up

                That's what I expected. Booting in this case would be highly dangerous and could lead to data corruption.

                If there were issue like takeover node not able to boot, do we need to follow the same process of release of storage disks, destroying mailbox disks (local, partner) will bring up node?

                It must be decided on case by case basis. It is impossible to give blanket statement. You need to evaluate situation and decide about your priorities - immediate service availability with potential data loss or data integrity by all means.

      • Re: Is fabric metro cluster require all 4 mailbox disks to load ONTAP?
        aborzenkov Grand Marshal
        Currently Being Moderated

        1. I can execute cf disable and bring down both nodes separately

        2. In case of ISL (DWDM) link is not active,  can I bring up a single node (cf disable state)?

        For all I know this is not possible without effectively destroying active/active configuration.

        3. If manually disable ISL ports on switches and proceed to execute CFOD (cf forcetakover -d), shutdown of first site, next shutdown of surviving node

             upon maintenance if I try to bring the 2nd node (in absence of active ISL link) will node come up?

        Which one is "second"? Let's say you have site A and site B, intersite link is lost and site B did "cf takeover -d". Then you should be able to boot site B (it will come up in takeover mode hosting both A and B) but you should not be able to boot site A.

        4. I see KB article  on mail box disks (if node not accessible need to reset mb disks) but not sure whether node comes up with only 2 local mb disks.

        This is majority rule. You have 8 mailboxes in total (if not, cluster is misconfigured anyway). You need more than half of them for a node to boot.

        • Re: Is fabric metro cluster require all 4 mailbox disks to load ONTAP?
          scoatney NetApp Employee Novice
          Currently Being Moderated

          If a node has access to all of it's local mailbox disks prior to the node being shutdown - it needs access to the same (all) disks when it boots.  It only needs access to the partner mailbox disks if a takeover is needed.

           

          If you disable the ISL links while the node(s) are up, the HA mailbox logic will re-configure itself to the 2 remaining accessible disks. This can take a couple of seconds, there are EMS which indicates it's happened. You can then reboot the node with only 2 mailbox disks (the 2 remaining mailbox disks).

          However, if you may want to do 'cf forcetakeover -d', you should not disable the ISL first.  You run the risk of the data on the 2 plexes diverging and will loose data.

  • Re: Is fabric metro cluster require all 4 mailbox disks to load ONTAP?
    AEKAKIRA2 Novice
    Currently Being Moderated

    Thank you each and everyone on providing valuable inputs on subject, by sparing your time. I will leave the query open till couple of days.

More Like This

  • Retrieving data ...

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points