A very generous company donated an IBM N6040 to us. We own a FAS 2240 so I am familiar with NetApp administration. However the IBM N6040 still has a clustered configuration on it and the root password is unknown. Any attempt to enter maintenance mode to change the password results in the #2 controller going into takeover mode and I never get to the Ctrl-C options menu on controller #1. I get the message:
In a cluster, you MUST ensure that the partner is (and remains) down, or that takeover is manually disabled on the partner node, because clustering software is not started or fully enabled in Maintenance mode.
It's a Catch 22, I can't log into controller #2 and do a cf disable - no root password. Being unable to do a cf disable on controller #2 how do I ensure that node 2 is down, tried powering it off, disconnecting the power cord....it seems to keep running though...clearly running on controller #1s PS.
I'd like to be able to run the setup command on both controllers and get them up and running CIFS in our environment. But I'm stuck. I must be missing something...
What is the N6040 equivalent to? Is it a single chassis? I've never had one of these but I presume you should be able to power it off via the rlm/sp, and make sure its down via the console. I'd aslo presume that specific power supplies power up certain systems within the chassis?
If not pull the mainboard out (which I would think you shouldn't have to do). Or disconnect the SAS/FC paths from the head that should be offline.
I believe that it's equivalent to a FAS3140. Not sure what you mean by 'single chassis'. I can "Halt" the system via the RLM but both heads (controllers, nodes?) still flash LEDs on HBAs and FC ports. It has 5 disk shelves.
I did disconnect the FC paths from controller 2, but controller 1 still believed it was in takeover mode, even after I halted and rebooted the system.
By single chassis I mean 2 controllers in a single hunk of metal (no IOXM).
If you halt a controller the other controller will takeover. Halt the system without takover via halt -f:
halt [-d dump_string] [-t interval] [-f]
-d dump_string causes the storage system to perform a core dump before halting. You use dump_string to describe the reason for the core dump. The message for the core dump will include the reason specified by dump_string.
Attention: Using halt -d causes an improper shutdown of the storage system (also called a dirty shutdown). Avoid using halt -d for normal maintenance shutdowns. For more details, see the na_halt(1) man page.
-t interval causes the storage system to halt after the number of minutes specified by interval.
-f prevents one partner in an active/active pair from taking over the other after the storage system halts.
The storage system displays the boot prompt. When you see the boot prompt, you can turn the power off.
I think this is the command you need. I'd make sure however that each head only see its own disks/shelves (you may want to halt both heads with halt -f, and then only cable the shelves/disks each head needs).
OK, it is a single chassis. It has 6 Power supplies in the chassis. The only controller I can Halt is Controller 1. Controller 2 won't shut down so I can toggle Ctrl - C for boot options. I can get controller 1 to the Loader> prompt. which as you know has a very limited number of commands available. I halted controller 1 from that degraded state [the *> prompt] with Halt -f but I don't think it recognized the argument - controller 2 went into takeover mode...again. Do I have to kill power to the entire system and then remove the system board from controller2?
Assuming no shelves with data you need... it is worth running from loader "set-defaults" then boot_ontap to reset the env settings that believe it is in takeover...you could also unsetenv partner-sysid to remove the partner setting and it will reset on its own once the systems come back up.
If you dont care about the data on them try this. Turn both systems and shelves off and disconnect the power from everything. Remove one of the system boards and only cable to one set of shelves to the controller that is not removed. Power up the shelves then the head.
It will still warn you in maint mode, but clearly the other system is down. So go ahead and change the password. Once that is done halt and power down this head (remove the controlled), and do the other head with the remaining shelves.
I am certain there is an easier way but again I've never bothered with single chassis. Seems like too much trouble.
Powered down the whole system, slid out the main board on controller 2 and I disconnected the FC cables. I did not disconnect any disk shelves. To my untrained eye it looks like both controllers are cross-connected to every shelf. Seems like a Pandora's box I don't want to open. I DID get to the special menu this way however! Alas, there is no love in the universe.
The system halts even after selecting option 3 and keying Enter. It reports that it's in been taken over.
Data ONTAP Release 7.3.1: Thu Jan 8 05:06:50 PST 2009 (IBM)
Copyright (c) 1992-2008 NetApp.
Starting boot on Mon Feb 10 16:21:40 GMT 2014
Mon Feb 10 16:21:55 GMT [netif.linkDown:info]: Ethernet RLM/e0M: Link down, check cable.
Mon Feb 10 16:21:58 GMT [diskown.isEnabled:info]: software ownership has been enabled for this system
This boot is of OS version: Data ONTAP Release 7.3.1.
The last time this filer booted, it used OS version: <unknown>.
The WAFL/RAID versions of the previously booted OS are unknown.
If you choose a boot option other than Maintenance mode or
Initialize disks, the file system of your filer might be upgraded
to a new version of the OS.
If you do not want to risk having your file system upgraded, choose
Maintenance mode or reboot using the correct OS version.
(1) Normal boot.
(2) Boot without /etc/rc.
(3) Change password.
(4) Initialize owned disks (21 disks are owned by this filer).
(4a) Same as option 4, but create a flexible root volume.
(5) Maintenance mode boot.
Selection (1-5)? 3
Mon Feb 10 16:22:03 GMT [fmmb.current.lock.disk:info]: Disk 1a.16 is a local HA mailbox disk.
Mon Feb 10 16:22:03 GMT [fmmb.current.lock.disk:info]: Disk 2a.17 is a local HA mailbox disk.
Mon Feb 10 16:22:03 GMT [fmmb.instStat.change:info]: normal mailbox instance on local side.
Mon Feb 10 16:22:03 GMT [coredump.spare.none:info]: No sparecore disk was found. Mon Feb 10 16:22:03 GMT [raid.vol.replay.nvram:info]: Performing raid replay on volume(s) Restoring parity from NVRAM
Mon Feb 10 16:22:03 GMT [raid.cksum.replay.summary:info]: Replayed 0 checksum blocks.
Mon Feb 10 16:22:03 GMT [raid.stripe.replay.summary:info]: Replayed 0 stripes.
HALT: cluster partner has taken over (fm) on Mon Feb 10 16:22:04 GMT 2014
It's like a dog with a bone!
Otlichno! [Russian for Excellent....assuming you are Russian) It was destroying the local mailbox that made the difference.
Thanks aborzenkov. And thanks to Doug and Scott for hanging in with me. It was all great advise and I learned a lot of tricks.
Now, on to controller 2.... *sigh*