I'm having performance issues with Snapmirror XDP relationships between two clusters (primary 4-node 3250, secondary 2-node 3220, all running cDOT 8.2P5).
The replication LIFs on both sides are pure 10G (private replication VLAN), flat network (no routing), using jumbo frames (however also tested without jumbo and problem persist).
The vaulting works in general, but average throughput for a single node/relationship never goes beyond 1Gbit/s - most of the time it is even much slower (300-500MBit/s or less).
I verified that neither the source node(s) nor the destination node(s) are CPU or disk bound during the transfers (at least not all of the source nodes).
I also verfied the SnapMirror traffic is definitely going through the 10g interfaces.
There is no compressed volume involved, only dedupe (on source).
The dedupe schedules are not within the same time window.
Also checked the physical network interface counters on switch and netapps, no errors / drops, clean.
However, the replication is SLOW, no matter what i try.
Customer impression is that it got even slower over time, e.g. throughput of a source node was ~1GBit/s when relationship(s) were initialized (not as high as expected), and dropped to ~ 500MBit/s after some months of operation / regular updates...
Meanwhile the daily update (of all relationships) sums up to ~ 1,4 TB per day, and it takes almost the whole night to finish :-(
So the question is: how to tune that?
Is anyone having similar issues regarding Snapmirror XDP throughput over 10Gbit Ethernet?
Are there any configurable parameters (network compression? TCP win size? TCP delayed ack's? Anything I don't think of?) on source/destination side?
Thankful for all ideas / comments,
When we had vol move issues between cluster nodes.. (via 10Gb) we hit BUG 768028 after opening a performance case... this may or may not be related to what you see... this also impacted snapmirror relationships
workaround was disabling redirect scanners from running OR, you can disable the free_space_realloc completely.
storage aggregate modify -aggregate <aggr_name> -free-space-realloc no_redirect
We have been having this problem for months with plain old snapmirror. We have a 10G connection between two different clusters, and snapmirror performance has been calculated at around 100Mb/sec - completely unacceptable.
We disabled reallocate per bug 768028, no help. We disabled throttling, and that helped, but not enough - and as a debug flag, disabling throttling comes with a production performance cost.
We used this to disable the throttle:
Thanks for the tips.
I tried both settings (disable free-space realloc on source and destination aggrs, as well as setflag repl_throttle_enable 0 on source and destination nodes), but this did not make things better, maybe slightly but not really.
I meanwhile experimented a bit more, migrated all intercluster interfaces of my source nodes to gigabit ports (instead of vlan-tagged 10gig interfaces).
Although unexpected this helped quite a lot! Doubled my overall throughput by going to slower NICs on the source side.
Next step is finally open a performance case
It smells like flow control on the 10Gb links to me..
ask the network guys if they see alot of RX/TX pause packets on the switch ports.. may as well check for switch port errors while you are at it.. or work with the network guys to see if there are a lot of retransmits
this is the situation where I would have hoped netapp would have network performance tools like iperf so we can validate infrastructure and throughput when the system is put in.
Checked that already. Switch stats look clean (don't have them anymore).
Ifstat on netapp side looks good:
-- interface e1b (54 days, 8 hours, 12 minutes, 26 seconds) --
Frames/second: 3690 | Bytes/second: 18185k | Errors/minute: 0
Discards/minute: 0 | Total frames: 30935m | Total bytes: 54604g
Total errors: 0 | Total discards: 0 | Multi/broadcast: 7
No buffers: 0 | Non-primary u/c: 0 | Tag drop: 0
Vlan tag drop: 0 | Vlan untag drop: 0 | Vlan forwards: 0
Vlan broadcasts: 0 | Vlan unicasts: 0 | CRC errors: 0
Runt frames: 0 | Fragment: 0 | Long frames: 0
Jabber: 0 | Bus overruns: 0 | Queue drop: 0
Xon: 0 | Xoff: 0 | Jumbo: 0
Frames/second: 3200 | Bytes/second: 12580k | Errors/minute: 0
Discards/minute: 0 | Total frames: 54267m | Total bytes: 343t
Total errors: 0 | Total discards: 0 | Multi/broadcast: 78699
Queue overflows: 0 | No buffers: 0 | Xon: 6
Xoff: 100 | Jumbo: 3285m | Pktlen: 0
Timeout: 0 | Timeout1: 0
Current state: up | Up to downs: 2 | Speed: 10000m
Duplex: full | Flowcontrol: full
So there are some Xon/XOff packets, but very few.
The very same link (same node) transports NFS I/O with 500MByte/sec (via different LIF), so I don't think the interface has any problems. However, Snapmirror average throughput remains below 50 MByte/sec, no matter what I try.