35 Replies Latest reply: Apr 3, 2014 3:33 AM by ag RSS

WFA issue with cache updates

dcornely1
Currently Being Moderated

Hello,

 

I'm running the latest version of WFA v2.1.0.70.32 in my environment and I've come across an issue I thought wouldn't exist because I'm using certified commands.  The issue is that WFA does not appear to be aware of changes it has made before the OnCommand cache updates occur.  Here is the scenario:


Step 1)

I'm creating a new CDOT export policy, export rule, and volume.  This flow works without issue and is called ss_cdot_CreateNFSVolumeWithExport

Step 2)

Before WFA and/or OnCommand has had a chance to learn about the change from the step 1 workflow via scheduled cache updates I attempt to run a workflow

that creates a new rule in the policy created from step 1.  This fails and will continue to fail until WFA's cache is updated from OnCommand and it learns about this new policy.

This flow is called ss_cdot_CreateExportRule

 

All the commands in both flows are certified so I had thought that would avoid this issue.  I had originally been using a modified No-Op command in the first create flow for the reasons behind this post but even after removing that single non-certified command the problem remains.  The only thing I can think of is that I created these flows in WFA v2.0 and recently upgraded to v2.1.

 

I'm either missing something or have encountered a bug in WFA regarding export policies and cache awareness of them, although I'm leaning towards an error I made somewhere but haven't found it yet.  I'm attaching both flows in the hopes that they will reveal where I've tripped up.  Hopefully it's something simple, thanks in advance.

 

-Dave

  • Re: WFA issue with cache updates
    ag
    Currently Being Moderated

    Dave,

     

    How do you give the input for export rule specification? I need a lot of time to figure that out from the workflow internals

    Will be easier if you let me know. Please specify an example.

    • Re: WFA issue with cache updates
      dcornely1
      Currently Being Moderated

      The input is based on one of the example workflows that came with WFA, specifically "Create a Clustered Data ONTAP NFS Volume".  It's a loop that will add each rule --- the number of loops will equal the number of rules requested leveraging a couple different functions.

      Both workflows I attached do work successfully independent of each other.  The issue is that WFA for some reason is not aware of the export policy it created for the new volume before the next OnCommand cache update occurs, something I thought I was able to work around by using certified commands.

       

       

      Here is the description from the command I copied:

      Export Specification Rule is an input that combines all the Export Rules for a Policy.  Each Export Specification Rule is of the form:

      <client-specification IP>;read-only rule;read-write rule;superuser rule

       

      Individual rules are separated by an ampersand (&).

       

      For example, the export rule spec '10.10.10.10;ntlm;krb5;sys&20.20.20.20;krb5;sys;ntlm' specifies two export rules:

      Export Rule 1) client-specification = 10.10.10.10

      read-only rule = ntlm

      read-write rule = krb5

      superuser rule = sys

      Export Rule 2) client-specification = 20.20.20.20

      read-only rule = krb5

      read-write rule = sys

      superuser rule = ntlm

      • Re: WFA issue with cache updates
        mgoddard
        Currently Being Moderated

        Hi Dave,

         

        If you check the Reservations tab, you can see which commands get reservations created. And it appears the Create Export Policy/Create Export Rule certified commands do not add a reservations. Looks like a bug to me. I've tried creating a few other types of objects with in-built commands for cDOT and they don't have reservations too.

         

        I got around this by defining the object if the search fails and I know its there, using other attributes from objects that did exist. However that means if its not actually there it fails at execution rather than evaluation time which isn't ideal.

         

        Here's my initial list of cDOT commands missing reservations so you don't hit them. It would be really nice if we could create custom reservations for our commands easily in the field!

         

        Missing Reservations

        - Create Export Policy

        - Create Export Rule

        - Create CIFS Share

        - Remove CIFS share ACL

        - Add CIFS share ACL

         

        Sorry I don't have a better answer, but I can definitely reproduce your problem (and have been working around it myself).

        For now, I would suggest defining the object instead of searching for it, since you are passing in the rule name anyway, and the vserver search will work because its not a new item.

         

        cheers,

        - Michael.

  • Re: WFA issue with cache updates
    ag
    Currently Being Moderated

    Hi Dave,

     

    I am able to reproduce the issue.

    To get around this issue, You can reduce the DFM option "sharemoninterval" to a low value like 30 seconds or 1 minute. That will acquire the data on export-policies. Also reduce the acquire interval for data sources on WFA to a low value. But then again, it is a tedious job and you may end up waiting a few minutes between the two workflows.

    As michael said, export policies do not have reservations. It is not a problem with the command.

     

    I am curious as to why you are not using the certified workflow "Create a clustered Data ONTAP NFS volume" which does the same job as the two of your workflows combined?

     

    Thanks,

    Anil

    • Re: WFA issue with cache updates
      dcornely1
      Currently Being Moderated

      Thanks for the info on the timeouts - I've got the updates down low already but don't want to make the OnCommand one so low that it add unnecessary load to the cluster.

      I've split the 2 flows out because I have to solve for a use case where a customer first provisions a new volume/NFS share and then immediately turns around to add more export rules, perhaps because they forgot about them initially.  I don't make up these use cases nor do I establish what is a reasonable SLA for these use cases.  I just have to do my best to achieve the SLA for the use cases.

  • Re: WFA issue with cache updates
    mgoddard
    Currently Being Moderated

    Hi Dave,

     

    Good news! I've created a custom Create Export Policy command that includes reservations missing from the certified command, you could use it to also avoid the problem in a more robust manor, attached below.

     

    I tested by replacing the certified command in the first workflow (CreateNFSVolWithExport), and the second workflow now finds the policy before a polling cycle.

     

    Hope that's useful!

     

    Kind Regards,

    - Michael.

    • Re: WFA issue with cache updates
      dcornely1
      Currently Being Moderated

      Michael, thank you very much!  I'll see if I have time to get this in play before we deploy CDOT this weekend.

    • Re: WFA issue with cache updates
      Francois Egger
      Currently Being Moderated

      Michael,

      I'm looking for a way to use reservation in my custom commands, for caching purpose.

      I saw in xml file:

       

      INSERT INTO cm_storage.export_policy

      SELECT NULL as id,

      PolicyName as name,

      vs.id as vserver_id

      FROM cm_storage.vserver vs

      JOIN

      cm_storage.cluster cl

      ON (cl.primary_address=Cluster OR cl.name=Cluster)

      AND vs.cluster_id = cl.id

      AND vs.name = VserverName;

       

      Is it the clue?

       

      Regards,

      François

      • Re: WFA issue with cache updates
        abhit
        Currently Being Moderated

        Reservation is not supported in custom commands.

        This feature may be available in a future release.

         

        Michael, you have used a certified command in the workflow

        which has a reservation script.


        Regards

        Abhi

        • Re: WFA issue with cache updates
          Francois Egger
          Currently Being Moderated

          Hi abhit,

          I exported my custom command to dar and changed xml to integrate <reservationScript> section from certified command "Clone Volume" .

          My problem is fixed.

          Do you see something dangerous  to use in this way?

          François

          • Re: WFA issue with cache updates
            abhit
            Currently Being Moderated

            Hi Francois:

             

            Wow. That is a great workaround.

            You have to test extensively and qualify it for usage

            since it is not a supported or recommended process.

             

            Regards

            Abhi

          • Re: WFA issue with cache updates
            yannb
            Currently Being Moderated

            Great tip François.

             

            When I did the same thing, it seemed to work, until you do an acquisition in WFA.

             

            I don't know why yet, but what I got, looking at the reservations in WFA web UI, was a "Cache Updated" YES, for an export policy that was not refreshed in OCUM yet (It was "NO" before acquisition). The volume reservation had a good status of refreshed to "NO" (i.e. waiting for OCUM to report it).

             

            I might have done something wrong, I need to do some research

            • Re: WFA issue with cache updates
              Francois Egger
              Currently Being Moderated

              Hi Yann,

              Where do you retrieve <reservationScript> xml portion? As certified command are not exportable, it was necessary to take a look directly in MySQL DB.

              For that,  I installed a separate MySQL server, where I have root access and I restored the full DB of WFA. wfa.command_definition was now accessible.

               

              All the informations was in

              SELECT reservation_script FROM wfa.command_definition

              WHERE command LIKE '%clone%';

               

              François

              • Re: WFA issue with cache updates
                yannb
                Currently Being Moderated

                I used your actually, copy and pasted the reservation block, but keeping my own copy of the Export Policy Create command.

                 

                It looked consistent, even if I did not understand how variable substitution was done.

              • Re: WFA issue with cache updates
                yannb
                Currently Being Moderated

                Well, actually it is even weirder... Qtree creation is not populated in the qtree table either, but that Command is supposed to use reservation... really odd

                • Re: WFA issue with cache updates
                  ag
                  Currently Being Moderated

                  Yann,

                   

                  To answer this specific question, reservations are not directly committed to the specific tables. In your case, qtrees that are newly created are not directly committed to the qtree table. Rather they are stored in the wfa.reservation table and will be committed once the acquisition from OCUM confirms the objects presence. However from the WFA UI, if you were to use a filter to find the newly created qtree, it will use this reservation data and make it appears as though it were taken from the qtree table itself.

                  • Re: WFA issue with cache updates
                    yannb
                    Currently Being Moderated

                    Yep, I finally figured, thanks for the answer!

                     

                    So, I have that workflow that creates a qtree and an export policy, empty at first

                    Then another workflow that adds rules to the export policy.

                     

                    Here is what happens when I run the first workflow in the reservations :

                    Banners_and_Alerts_et_WFA_by_NetApp.png

                    Here, cache is not updated, good, I expect that. My understanding is that "NO" means "I didn't get that one from OCUM yet"

                     

                    Then I run "Acquire now" on my OCUM data source in WFA, and here is how it changes reservations :

                    WFA_by_NetApp.png

                    Export policy is then marked as cache updated... but not the Qtree. It does not make sense because OCUM did not discover that export policy yet.

                     

                    So now, my second workflow will fail, saying that "No results were found. The following filters have returned empty results:".

                     

                    It looks like there is a incoherence between the process that refreshes cache and the one that populates the database : i.e. reservations says I got the export policy from OCUM, but the export_policy table does not list it.

                     

                    If I re-discover in OCUM, then run Acquisition from WFA, everything is back to normal and I can reference my export policy again, and both entries are marked as Cache updates

                    WFA_by_NetApp 2.png

                     

                    Does that make sense ? Would that be a problem with the "hack" or the SQL query defined in the reservation section ?

                    • Re: WFA issue with cache updates
                      yannb
                      Currently Being Moderated

                      Ok, I got it...

                       

                      What is missing with this "hack" is the "congruence_test", I guess that one is used in the middle to check the cache for an object.

                       

                      Full DAR file attached

                       

                      For the record, here is the test I implemented :

                       

                      SELECT e.id FROM cm_storage.export_policy e JOIN cm_storage.vserver vs ON e.vserver_id = vs.id AND vs.name = '${VserverName}' JOIN cm_storage.cluster c ON (c.primary_address='${Cluster}' OR c.name='${Cluster}') AND vs.cluster_id = c.id WHERE e.name='${PolicyName}';

                  • Re: WFA issue with cache updates
                    Francois Egger
                    Currently Being Moderated

                    Anil,

                    I experience the same behavior, event certified commands  "clone volume", "remove volume"

                    my simple workflow delete cloned volume first, before clone again.

                    In the first round:

                    Tested the existence with "if volume was not found: disable this command" , the remove volume step was omitted. Good

                    Clone created successfully.

                    Second round cache works fine, because " remove volume" is executed, and clone works fine.


                    As Yann said, if I force OCUM acquire, cache updated change to YES  and I tried to relaunch workflow, the first step is ommited again. So delete doesn't occur and clone failed.

                    What is wrong?

                    • Re: WFA issue with cache updates
                      ag
                      Currently Being Moderated

                      Francois,

                       

                      This looks a bit strange because i find that there are reservations and congruence tests in both the remove volume and clone volume command.

                      With the given description i cannot figure out much.

                      I will be able to help if you can attach both workflows and a backup of your WFA.

                      • Re: WFA issue with cache updates
                        abhit
                        Currently Being Moderated

                        Francois:

                        Do you see the Volume which you are trying to delete in the DB?

                        You can use a DB viewer tool and see if the volume is present in the DB.

                        Abhi

                      • Re: WFA issue with cache updates
                        Francois Egger
                        Currently Being Moderated

                        Hi Anil,

                        Attached the simple workflow, note my custom command is inside, but disabled.

                        This morning after one night, the situation was back to normal, workflow running fine.

                         

                        DFM view:

                        before clone:

                        root@gdc01093# dfm volume list |grep test

                        1761 gdc01148:/test_clone                 Flexible     64_bit     No        

                                    

                        after clone

                        root@gdc01093# dfm volume list |grep test

                        1764 gdc01148:/test_clone                 Flexible     64_bit     No                     

                         

                        After aquire, WFA pull the old volume definition 1761, because dfm discover was not launched.

                        Could be an incidence?

                         

                        Regards,

      • Re: WFA issue with cache updates
        adaikkap
        Currently Being Moderated

        Hi Francois,

             Pls add your case to RFE 746319 so that it gets prioritized.

         

        BTW can you provide some more details on your custom command ?

         

        Regards

        adai

  • Re: WFA issue with cache updates
    ag
    Currently Being Moderated

    Francois and Yann,

     

    You guys have actually uncovered a limitation in WFA.

    I was able to reproduce the issue of the "clone volume workflow" where you remove a volume and clone it again.

    I digged deeper and here is what i found:

         1. There are two commands that create reservations here:

              Remove volume( if exists)

              Create Clone

     

         2. An important point here is that the volume that gets deleted and the clone being created have the same name.

     

         3. The workflow runs perfectly fine the first time and reservations are created. But at this point, the "cache update" is set to "NO" for both the reservations. Which is expected. Now if you go ahead and run data acquisition on DFM and cache acquisition in WFA, the reservation for "create clone" is cleared("cache updated" is set to YES). But, the reservation for "remove volume" is not cleared("cache updated" in still NO). This is what you guys observed.

     

         4. Why is that happening?

    There are two conflicting reservations that are happening here. One reservation thinks "test_clone" volume does not exist(REMOVE VOLUME). The other reservation thinks "test_clone" volume exists and was newly created(CREATE CLONE).

    Now, When the cache acquisition happens in WFA, the congruence test for "create clone" checks if the newly reported data contains the "test_clone" volume. It does exist, so the reservation is CLEARED. No problems here.

    The congruence test for "remove volume" is expecting that "test_clone" volume is NOT present in the cache because it deleted the volume itself. But, surprise!! It is present, so it thinks that the OCUM has not yet reported that the volume is deleted and it remains in "cache updated" being NO. It will stay NO until the default period of 4hours(because OCUM will never report that the volume does not exist), which is why francois was able to come again the next day and execute it successfully.

     

         5.Coming to the results of the filter being weird.

    When you run a filter with "Use reservation" being checked, the filter takes the WFA cache, merges it with reservation data and provides the result.

    Essentially, what is happening here is that the filter takes the "test_clone" volume from WFA and applies any reservations (Remember, the "remove volume" is not cleared yet). The resultant effect is that the volume is removed.

    If you remove the "use reservation" check in the filter, the "test_clone" volume is taken from WFA cache but the "remove volume" reservation is NOT run. Therefore you will see the volume in the result.

     

         6. How can we fix this?

    This can happen to any object(not just volume) if a workflow create conflicting reservations.

    To decide upon a fix and provide a workaround, i need to understand the use case here.

    Why is this workflow being used exactly?

    Are there similar workflows that are being used?

    If not, what other workflows are being run?

    How frequently will this workflow required to be run?

     

    It will be helpful if i can get answers to those questions.

     

    Thanks,

    Anil

More Like This

  • Retrieving data ...