5 Replies Latest reply: Apr 19, 2013 5:30 PM by DOUGLASSIGGINS RSS

High CPU utilization on Snapmirror source Filer

MAHESHA_HP
Currently Being Moderated

I am experiencing very high CPU utilization on Snapmirror source filer... following is the sysstat output... please explain me what is happening here and how to troubleshoot the issue... Please suggest how to bring down the CPU usage.

Model Number   FAS6080.

OnTap : 7.3.1.1

 

CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s iSCSI  kB/s

                                  in   out   read  write  read write   age   hit time  ty util                 in   out    in   out

98% 16344     0     0   16344 11776 31183 138946  62100     0     0    46s  97% 100%  :s  94%      0     0     0     0     0     0

94% 17396     0     0   17396 20400 37522 127637  50743     0     0    49s  76% 100%  :s 100%      0     0     0     0     0     0

85% 21370     0     0   21370 37059 36892 130290  55293     0     0    49s  76% 100%  :f  88%      0     0     0     0     0     0

76% 20092     0     0   20092 22936 33285 153200 100764     0     0    49s  77% 100%  :f  98%      0     0     0     0     0     0

75% 18504     0     0   18504 12448 46405 195840 127316     0     0    52s  77% 100%  :f  95%      0     0     0     0     0     0

80% 19390     0     0   19390 12710 75414 246752 127848     0     0    52s  77% 100%  :f  94%      0     0     0     0     0     0

80% 21079     0     0   21079 12770 47392 210780 144776     0     0    54s  78% 100%  :f  93%      0     0     0     0     0     0

78% 23482     0     0   23482 14187 39402 183452 115132     0     0    54s  75% 100%  :f  99%      0     0     0     0     0     0

73% 15260     0     0   15260 10296 68188 160250  76143     0     0    54s  90%  99%  Zn  90%      0     0     0     0     0     0

99% 12250     0     0   12250 13509 27401 122256  43869     0     0    57s  96% 100%  :n  94%      0     0     0     0     0     0

94% 21851     0     0   21851 13085 56588 170259  50299     0     0    57s  89% 100%  :s 100%      0     0     0     0     0     0

99% 14151     0     0   14151 10729 60332 142345  55455     0     0     1   82% 100%  :s  72%      0     0     0     0     0     0

99% 14017     0     0   14017  9672 28077 110818  65818     0     0    54s  74% 100%  :s  78%      0     0     0     0     0     0

98% 17758     0     0   17758 15354 37078 183540 129594     0     0    54s  80% 100%  :s  93%      0     0     0     0     0     0

 

After reading some discussions, i also pulled deswizzling information :

 

filer*> wafl scan status

Volume 2:

Scan id                   Type of scan     progress

       4    active bitmap rearrangement     fbn 15946 of 51757 w/ max_chain_len 7

       8             volume deswizzling     snap 212, inode 61367019 of 84210525. level 1 of normal files. Totals: Normal files: L1:1809983/78492062 L2:1020/40693854 L3:0/0 L4:0/0     Inode file: L0:4510907/4510907 L1:0/0 L2:0/0 L3:0/0 L4:0/0

  119781    container block reclamation     block 33755 of 51758

Volume 3:

Scan id                   Type of scan     progress

       9    active bitmap rearrangement     fbn 13654 of 61573 w/ max_chain_len 7

      13             volume deswizzling     snap 56, inode 40845777 of 94736838. level 1 of normal files. Totals: Normal files: L1:2435690/482816606 L2:460/54431874 L3:0/0 L4:0/0      Inode file: L0:0/0 L1:0/0 L2:0/0 L3:0/0 L4:0/0

  119783    container block reclamation     block 26911 of 61574

  119784     blocks used summary update     block 10054 of 61574

  119785     snap create summary update     block 28230 of 61574

Volume 4:

Scan id                   Type of scan     progress

      14    active bitmap rearrangement     fbn 7079 of 60681 w/ max_chain_len 7

      18             volume deswizzling     snap 80, inode 57477525 of 86315775. level 1 of normal files. Totals: Normal files: L1:1689142/279186950 L2:900/14791252 L3:0/0 L4:0/0      Inode file: L0:0/0 L1:0/0 L2:0/0 L3:0/0 L4:0/0

  119782    container block reclamation     block 41437 of 60682

Volume vol0:

Scan id                   Type of scan     progress

      19    active bitmap rearrangement     fbn 242 of 365 w/ max_chain_len 7

Volume 1:

Scan id                   Type of scan     progress

      39    active bitmap rearrangement     fbn 38979 of 41049 w/ max_chain_len 7

      42             volume deswizzling     snap 6, inode 25361700 of 65263149. level 1 of normal files. Totals: Normal files: L1:2569358/1072895942 L2:0/92446730 L3:0/0 L4:0/0        Inode file: L0:0/0 L1:0/0 L2:0/0 L3:0/0 L4:0/0

  119780    container block reclamation     block 33789 of 41050

Volume vol1new:

Scan id                   Type of scan     progress

      31    active bitmap rearrangement     fbn 27928 of 31233 w/ max_chain_len 7

      33             volume deswizzling     snap 78, inode 9386622 of 65263149. level 2 of normal files. Totals: Normal files: L1:1129306/56166842 L2:615/20360874 L3:0/0 L4:0/0        Inode file: L0:14149634/14149634 L1:0/0 L2:0/0 L3:0/0 L4:0/0

Volume 5:

Scan id                   Type of scan     progress

      26    active bitmap rearrangement     fbn 29492 of 41049 w/ max_chain_len 7

      29             volume deswizzling     snap 229, inode 39864279 of 65263149. level 1 of normal files. Totals: Normal files: L1:1808099/705619767 L2:446/55239607 L3:0/300 L4:0/0   Inode file: L0:0/0 L1:0/0 L2:0/0 L3:0/0 L4:0/0

  119779    container block reclamation     block 37330 of 41050

  • Re: High CPU utilization on Snapmirror source Filer
    RAVI.PALADUGU
    Currently Being Moderated

    Mahesh,

     

    OnTap is designed in a way to handle even if the cpu's are running at 99%, but it doesn't mean you are good with this. How often is the cpu busy like this? You need to remember that the CPU stat you’re seeing is not actually showing you the combined or averaged CPU time across all four CPUs, but is actually showing you the total time that any of the four processors is in use.

     

    You may have to open a case with performance issues on this box, they will request you to run perfstat and nsanity to analyze what is actually causing this. Is there any reallocation job running at that time?

     

    Can you please post the output of the following:

     

    1. Go to diag mode and run sysstat -M (note the upper case M)

     

    Please refer to this KB article which can give you more insight on this https://kb.netapp.com/support/index?page=content&id=3010150

     

    Best

    RK

  • Re: High CPU utilization on Snapmirror source Filer
    bbjholcomb
    Currently Being Moderated

    Are you using snapmirror compression? According to the docs it can add additional 35% utilization.

  • Re: High CPU utilization on Snapmirror source Filer
    DOUGLASSIGGINS
    Currently Being Moderated

    The disks are very busy. Pull a statit, and see if any disks are not happy. If not you may need to add more disks. If so, you may have 1 hot, or just 1 disk that is responding slowly.

     

    So do these three things:

    1. sysstat -M (you'll probably see 1 cpu much busier than the rest)

    2. run statit -b, then subsequently statit -e (with 5 min or so in between)

    3. I'd assume snapmirror compression only hurts CPU when the snapmirror is running, when did you run the sysstat?

  • Re: High CPU utilization on Snapmirror source Filer
    DOUGLASSIGGINS
    Currently Being Moderated

    It looks like you may want to add some more disks to the following rg and check into a possible reallocation:

     

    /aggr2/plex0/rg1:

    0e.45              4  22.97    0.00   ....     .   8.70  17.21   359  14.27   2.99   527   0.00   ....     .   0.00   ....     .

    0d.68              4  22.97    0.00   ....     .   8.74  17.14   353  14.23   2.90   615   0.00   ....     .   0.00   ....     .

    0b.86             41 134.69  115.73   2.37  3615   4.31   3.70  3066  14.65   6.87   617   0.00   ....     .   0.00   ....     .

    0d.83             43 149.59  125.57   2.13  3720   8.23   5.87  1689  15.79   4.40   943   0.00   ....     .   0.00   ....     .

    0d.84             46 167.71  141.53   2.13  3561  10.60   5.83  1630  15.58   3.93  1175   0.00   ....     .   0.00   ....     .

    0b.93             49 176.49  151.11   2.01  3818  10.30   5.73  1924  15.07   4.15  1082   0.00   ....     .   0.00   ....     .

    0b.69             59 219.73  196.34   2.08  3879   8.91   8.40  1530  14.48   3.65  1411   0.00   ....     .   0.00   ....     .

    0f.53             67 262.96  241.81   2.13  4278   8.61  11.47  1122  12.54   3.32  1164   0.00   ....     .   0.00   ....     .

     

    /aggr6/plex0/rg1:

     

    0f.82              6  43.91    0.00   ....     .  17.18  10.21   692  26.73   4.22   350   0.00   ....     .   0.00   ....     .

    0e.73              6  43.95    0.00   ....     .  17.23  10.19   696  26.73   4.19   381   0.00   ....     .   0.00   ....     .

    0d.74             39 113.92   85.80   1.72  5097   9.29   3.91  2797  18.83   3.25   806   0.00   ....     .   0.00   ....     .

    0h.82             35 106.70   82.38   1.76  4837   6.88   4.05  2050  17.44   3.36   664   0.00   ....     .   0.00   ....     .

    0b.83             56 153.94  121.94   1.84  6647  11.95   4.34  3268  20.06   2.96  2345   0.00   ....     .   0.00   ....     .

    0b.87             57 148.62  119.87   1.79  7350   9.92   4.03  3206  18.83   3.10  2191   0.00   ....     .   0.00   ....     .

    0d.82             58 160.02  130.64   1.79  7409  11.15   4.66  2488  18.24   2.79  2355   0.00   ....     .   0.00   ....     .

    0d.72             61 160.49  130.59   1.59  9202  11.78   5.07  2537  18.11   2.74  2605   0.00   ....     .   0.00   ....     .

    0d.77             63 166.10  139.55   1.74  8295   9.46   6.43  2171  17.10   3.05  3323   0.00   ....     .   0.00   ....     .

    0b.66             67 201.57  169.14   1.77  7840  13.89   5.47  2683  18.54   2.83  1630   0.00   ....     .   0.00   ....     .

    /aggr6/plex0/rg2:

    0c.39             12  41.25    0.00   ....     .  40.87   4.60   928   0.38   6.89   210   0.00   ....     .   0.00   ....     .

    0c.40             12  41.59    0.00   ....     .  40.96   4.59   922   0.63   4.93   270   0.00   ....     .   0.00   ....     .

    0c.41             50 158.59   99.48   1.87  6519  21.53   3.27  2171  37.58   3.14  1047   0.00   ....     .   0.00   ....     .

    0c.42             47 145.92   85.21   1.64  6839  21.66   2.92  2462  39.06   3.19   872   0.00   ....     .   0.00   ....     .

    0c.43             47 146.98   87.61   1.67  7131  20.18   3.23  2041  39.18   3.16  1068   0.00   ....     .   0.00   ....     .

    0c.44             51 156.05   95.85   1.72  8216  21.96   3.23  2203  38.25   3.06  1104   0.00   ....     .   0.00   ....    

     

    See this article for more information on high cpu for many core machines:

    https://communities.netapp.com/thread/22448

More Like This

  • Retrieving data ...