My IBM rebranded N6240 (FAS3240) running 8.1.1P2 was running swimmingly when all i/o abruptly came to a near halt.
First thing I noticed was these in the messages file.
wafl.cp.slovol:warning]: aggregate aggr1 is holding up the CP.
wafl.cp.slovol:warning]: aggregate aggr0 is holding up the CP
sysstat -x 1 showed the cpu nearly idle at around 5% per core, 0% cache hit ratio and disk usage also around 5%. Nvram was however showing 100% full with a Cp type of #s over and over again.
The toaster pulsated in and out of this state every 10 minutes or so it seemed to start to recover, maybe responding to an estimated 20% of of its io requests before wigging out again and going back into this state.
1 of my aggregates is 32 bit, one is 64 on this unit.
Has anyone seen any behavior like this? A takeover/reboot mitigated the issue. A coredump was captured and given to IBM Nas support who tells me they have escalated to netapp.
I found one similar issue here, but mine remained in this state for about an hour before I rebooted.
Any comments or advice would be appreciated.
My goggling tells me this may be
I think this bug is indeed a good cadidate for your problems.
This is a defect in WAFL that exists since long, but probably because of changes in Ontap 8.1.1 gets triggered a lot more, especially on busy aggregates (heavy read load which can be due to misalignment, database verify's, ...). If you create a Netapp case, be sure to mention this bug number. However it seems that Netapp engineering can only verify if this burt has been encountered if a perfstat or core dump is done during the problem window.
The #s CP type is just a victim of the problem as CP hangs for a long time.
In Ontap 8.1.2P3 you can find 7 bug fixes that make the chance that you run into this problem a lot less likely (they can be found under the title "Inefficient pre-fetching of metadata blocks delays WAFL consistency point").
I recommend you to upgrade to this Ontap version as soon as possible.