I've setup a FAS3270 and want to test the failover to make sure it works. Here's what I do
1) Make sure HA is enabled on both controllers
2) Verify that the interface group (ifgrp0) is in shared mode for both controllers
3) Start a continuous ping of both interface groups IP addresses
4) From the CLI run cf takeover on the first controller
5) Using SP watch the two controllers as they gracefully fail over and the one being taken over reboots
6) Once the rebooted controller reboots and is in the waiting giveback state run the cf giveback -f command
7) Watch the controllers transfer control to the other controller
8) Repeat using the other controller
What I'm seeing however doesn't convince me that HA is actually working. In step 5, I also watch the continuous pings, and basically from the time the taken over controller is taken over till the time the giveback is completed I see the pings to the IP address timing out. It's my understanding that the other controller should assume the IP address of the failed controller when in shared mode. If I can't ping it, then I can't access it, so how is this HA?
Both controllers have ifgrp0, on the same vlan, with separate IP Addresses 10.5.141.21 and 10.5.141.22
Am I missing something? If so, what?
Thanks for the response. While it's true that clients will experience a momentary disconnect when the node fails, it's my understanding that once the partner node takes over the clients reconnect and have access to their storage. I understand CIFS experience this disconnect the most.
My assumption is that once the partner node has taken over that the IP address of the failed node that it will now respond to pings to that address. This does not seem to be the case and is why I'm asking the question if my assumption is correct. I'm assuming I'm missing an options setting, just haven't figured out which one.
Usual case is stale ARP entry in client. Normally NetApp sends unsolicited ARP reply to make clients update ARP cache. But it is entirely in client side to listen and react on this. Another cause would be MAC table in switches.
Wait for 10 - 15 minutes. If you are now able to access filer after failover, it is most likely one if two reasons.
So I modified the test a bit, and from a device that is on the storage vlan I can ping the IP Address of the failed node and get a response.
From the core switch I can look at the arp table and see the IP Address has been remapped to the MAC address of the takeover node.
I can not however ping the address from a device not on the storage vlan.
This FAS3270 has a management port connected, and the ifgrp0 connected. On failover only the ifgrp0's IP Address is taken over by the takeover node. The Default gateway if for the management port. I wonder if I messed up with the default gateway, should it be for the storage vlan and not the management port.