thank you for this one sir! Another is what if 3 failed disk is down with out applying the raid.timeout option what could happend to the system? i know there will be a data loss or 24 hours shut down but is there a TR or white paper that can support this.
Thank You in advance!
A few things to consider. Remember that there can be multiple aggregates on the system, each of which can consist of multiple raid groups - aggr status -r will show the raid groups.
You can have two failed drives in every aggregate and still not lose data, because each aggregate is a seperate entity. In addition, you can lose two drives in each raid group of a _single_ aggregate without losing data, because each raid group is in its own raid-dp setup. So, if you have an aggregate with 4 raid-dp raid groups, you could lose 8 drives, as long as two come from each raid group, without losing data. For the record, I've seen this - an entire shelf powered off, but only two drives from that shelf were in any single raid group, so no data loss.
If you lost a third drive in a raid-dp raid group, that raid group would fail and the aggregate would go offline, and you'd lose data. Not sure if a failover would happen - assuming dual path HA, if the drive is down on one controller it'll be down on the other also. Also not sure if raid.timeout would shut the system down - I don't know that an offline aggregate constitutes a degraded state. Degraded implies that it's still running, which it isn't, technically.
I did a quick look, but couldn't find anything to describe what happens when you lose that third drive. This link (https://kb.netapp.com/support/index?page=content&id=3013638) hints that the controller will panic, but it is a different situation (media error on rebuild). One of the links is to article 2014172, which says to call support in the rare event that there was actually another disk failure - so maybe they don't publish what happens in that case.