8 Replies Latest reply: Dec 12, 2013 1:03 AM by CHYSLOP192 RSS

Intermittent Host Login Failed warnings

GRAEMEOGDEN
Currently Being Moderated

I am receiving occasional Host Login Failed warnings via dfm for the cluster admin account at different times of the day. These clear themselves automatically.

 

Is there any way to find out where these logins are coming from?

If it was DFM itself trying to access the cluster (bad password in config for example) I wouldn't expect the alarm to clear....

 

Thanks

Graeme

  • Re: Intermittent Host Login Failed warnings
    kryan
    Currently Being Moderated

    Greetings Graeme,

     

    can you provide us the version and mode of UM you are using as well as specifically where you are observing the warnings?

     

    Thanks,

    Kevin

    • Re: Intermittent Host Login Failed warnings
      GRAEMEOGDEN
      Currently Being Moderated

      Hi Kevin,

       

      OCUM version 5.1.0.15008

      Clustered DataONTAP version 8.1.2P3

       

      An example of the alarm email is below:

       

      A Warning event at 20 Jun 18:53 GMT Daylight Time on Cluster SEN_SAN_CLU01:

      Host Login Failed.

      Host SEN_SAN_CLU01 user admin login failed

         

      Click below to see the details of this event.

      http://TMGNAOCMSRV01.gpgroup.com:8080/start.html#st=1&data=(eventID=9167)

         

      *** Event details follow.***

       

      General Information

      -------------------

      DataFabric Manager server Serial Number: 1-50-017635 Alarm Identifier: 2

       

       

      Event Fields

      -------------

      Event Identifier: 9167

      Event Name: Host Login Failed

      Event Description: Host Login

      Event Severity: Warning

      Event Timestamp: 20 Jun 18:53

       

       

      Source of Event

      ---------------

      Source Identifier: 132

      Source Name: SEN_SAN_CLU01

      Source Type: Cluster

      Source Status: Warning

       

       

      Event Arguments

      ---------------

      hostName: SEN_SAN_CLU01

      hostLoginName: admin

       

       

      My concern is with it being the admin account (cluster administrator) I'd like to identify if this is a bad password in a configuration somewhere or someone attempting to log in to the cluster.

       

      Thanks

      Graeme

      • Re: Intermittent Host Login Failed warnings
        kryan
        Currently Being Moderated

        Hi Graeme,

         

        I would not expect that to be a configuration problem within UM since the hostPassword is only entered once for the entire cluster and if works one time it should continue to work. 

         

        This might be indicative of a node configuration problem or network access to a particular node(s).

         

        Does the warning only occur for one node (SEN_SAN_CLU01) or all of them?

         

        Kevin

        • Re: Intermittent Host Login Failed warnings
          GRAEMEOGDEN
          Currently Being Moderated

          I've had these alerts from two clusters in two separate datacentres at various times over the last week, which does sound like a network connectivity issue but I would've thought there would be a "host down" message in that case, not a failed login?

           

          If there's no method of obtaining more information from the OCUM logs I'll see if I can get more info from the clusters themselves.

           

          Thanks.

          • Re: Intermittent Host Login Failed warnings
            kryan
            Currently Being Moderated

            Using the default settings of UM 5.1, in order to produce a host down event/alert the host would need to be down for a significant amount of time.  This behavior was changed in UM 5.2 under bug 614983 (no public report at this time).

             

            OnCommand Unified Manager Core uses five different methods to identify if a host is down:

            1. echo
            2. http
            3. snmp
            4. ndmp
            5. echo_snmp  <== default

             

            The default behavior for a host down monitor run is a ping using ICMP echo and then snmpwalk.  UM will retry each method a pre-configured number of times with varying timeouts, as seen below.

             

            While the ICMP retries and timeouts have remained the same over the 5.x code line, the SNMP timeouts were increased in UM 5.1 for 7DOT and even more for 5.1 cDOT installations.

             

            Due to changes under bug 614983, if pingMonTimeout is set to less than or equal to 5 seconds, then the SNMP timeout for host down (pingmon) monitoring will be 5 seconds. If the pingMonTimeout is set to a value greater than 5 seconds, then the pingMonTimeout is used as the SNMP timeout.  The global MonSNMPTimeout is used for all other SNMP connections.  This applies to both 7DOTand cDOT versions of UM 5.2.

             

            ===============================================

            UM 5.0.x default values:

            monSNMPRetries                        4
            monSNMPTimeout                        5

            hostPingMethod                        echo_snmp
            pingMonInterval                       1 minute
            pingMonRetryDelay                     3
            pingMonTimeout                        3

            ===============================================

            UM 5.1 7DOT default values:

            monSNMPRetries                        4
            monSNMPTimeout                        60

            hostPingMethod                        echo_snmp
            pingMonInterval                       1 minute
            pingMonRetryDelay                     3
            pingMonTimeout                        3

            ===============================================

            UM 5.1/5.2 cDOT default values:

            monSNMPRetries                        4
            monSNMPTimeout                        300

            hostPingMethod                        echo_snmp
            pingMonInterval                       1 minute
            pingMonRetryDelay                     3
            pingMonTimeout                        3

            ===============================================

             

            Therefore, if a clustered ONTAP controller is down for less than 5 minutes, UM 5.1 will not report it as down as it would not have exceeded the first timeout value for the host down check.  If the ping method is changed to to echo or http the node down event is logged.

             

            Changing the monSNMPTimeout to the 5.0.x default value of 5 seconds allows UM to determine the host down status with the echo_snmp method. However, it is not recommend that this value be adjusted lower than the default for cDOT UM 5.1 servers as some SNMP transactions can take a few minutes to complete and should not be sent multiple times under 5 minutes.

          • Re: Intermittent Host Login Failed warnings
            GRAEMEOGDEN
            Currently Being Moderated

            Checking the cluster for auth failures suggests this isn't a user getting the password wrong!

             

            SEN_SAN_CLU01::*> event status show  -messagename login.auth.loginDenied

            Node              Message                      Occurs Drops Last Time

            ----------------- ---------------------------- ------ ----- -------------------

            SEN_SAN_CLU01-02  login.auth.loginDenied       1      0     5/8/2013 13:30:33

             

             

            I'll speak to the networking team!

             

            Thanks for your assistance Kevin.

            Cheers

            Graeme

  • Re: Intermittent Host Login Failed warnings
    GRAEMEOGDEN
    Currently Being Moderated

    Just to close this thread off, it turns out this was due to OnCommand using HTTPS to communicate with the clusters.

    Changing the host protocol to HTTP / Port 80 via the dfm command line stopped the warnings being generated.

     

    dfm host set <CLUSTERNAME> hostAdminPort=80 hostAdminTransport=http

  • Re: Intermittent Host Login Failed warnings
    CHYSLOP192
    Currently Being Moderated

    Hi all, 

     

    Thought I would add to this thread as its one of the few that helped me.  I had a call open for 3 months with NetApp support on this error, we had set transport to HTTP but that did not resolve the error for us.  It was only after adding a 3rd cluster to DFM we saw the new cluster error once and then no further Host Login Errors from the new cluster yet both original clusters were erroring 20 - 30 times a day.

    Yesterday I removed one of the clusters from DFM, carried out a purge on DFM and added it back in and the Host Login Errors have stopped for that cluster.

    I have informed NetApp and they want me to hold off removing and re-adding the last of the erroring clusters so they can get some information out of the system.

     

    Hopefully this will help others too if they still have issues.

More Like This

  • Retrieving data ...