35 Replies Latest reply: Jul 26, 2013 3:19 AM by coon RSS

Malformed XML exceptions ( how to handle)

explorenetapp
Currently Being Moderated

Can some please let me know when these exceptions happen? Also, please advise on how these can be handled? I see these exceptions every time on a different netapp filer for a different query. This is happening for nfs shares, volumes, disk drives, cifs shares etc.

 

This is an issue that needs to be immediately addressed. Any quick reply is highly appreciated. I would like to know if this is a problem in the calls inovked on the filer and not due to an issue in the client that is invoking the calls.

 

 

Following is the way the code is mostly used for invoking the calls.

 

NaElement elemIn = new NaElement(api); //where api is volume-list-info, nfs-exportfs-list-rules etc

//elemIn is added with a new child by invoking elemIn.addNewChild(params1_name, params1_value)

NaServer naServer = new NaServer(<address>,1,0);

naServer.invokeElem(elemIn);

 

Following is the exception thrown -

 

netapp.manage.NaProtocolException: Malformed XML

     at netapp.manage.NaServer.invokeElem(NaServer.java:667)

  • Re: Malformed XML exceptions ( how to handle)
    kunalm
    Currently Being Moderated

    I would suggest you post one of the code snippet for which you re getting this error. Would be helpful for others to recommend you after that.

  • Re: Malformed XML exceptions ( how to handle)
    aashray
    Currently Being Moderated

    Hi,

     

    Your complete code should be something like this :

     

      NaServer s = new NaServer("<ip>", 1 , 0);

                                  s.setServerType(NaServer.SERVER_TYPE_FILER);

                                  s.setTransportType(NaServer.TRANSPORT_TYPE_HTTPS);

                                  s.setPort(443);

                                  s.setStyle(NaServer.STYLE_LOGIN_PASSWORD);

                                  s.setAdminUser("<user>", "<password>");

                                  NaElement api = new NaElement("volume-list-info");

                                  api.addNewChild("volume","/vol0");

                                  NaElement xo = s.invokeElem(api);

                                  System.out.println(xo.toPrettyString(""));

     

    Also,


    Not sure if SAX XML parser is seeing additional < or > characters in volume-list-info (or other APIs) response data. Could you please take the raw xml output of

    volume-list-info command using apitest utility with -X option to check whether XML respose contains any additional < or > characters.

  • Re: Malformed XML exceptions ( how to handle)
    aashray
    Currently Being Moderated

    Also, What simulator and SDK versions are you using ?

    • Re: Malformed XML exceptions ( how to handle)
      explorenetapp
      Currently Being Moderated

      This is not a simulator but the acutal physical device. Can you please let me know how I can find out which SDK version I am using? I don't get any details of this in manageontap.jar.

      • Re: Malformed XML exceptions ( how to handle)
        aashray
        Currently Being Moderated

        I would like to know which version of ONTAP is running on your physical machine. You could find that out using the system-get-version API.

         

        There would be a file name SDK_help in the folder in which the manageontap.jar was packaged. Or do you have only the jar file and not the complete folder which has examples and help files?

         

        -Aashray

        • Re: Malformed XML exceptions ( how to handle)
          explorenetapp
          Currently Being Moderated

          Hi Aashray,

               Please give me some time to get back to you with the version of ONTAP running on the machine. Coming to the details of manageontap.jar, I have only the jar file and not the complete folder that has examples and help files.

           

          Thanks,

          Prasanna

          • Re: Malformed XML exceptions ( how to handle)
            aashray
            Currently Being Moderated

            Sure.

            Since SDK 4.0 we have started including the SDK version in the name of the jar file. But since your jar file does not have it its either older than 4.0 or has been renamed.

             

            There is no way of finding the SDK version using the SDK itself. Getting holding of the complete folder would help.

             

            I would also recommend trying out our ZEDI tool (comes as a part of the SDK bundle) which will help automatically generate code and run it.

             

            -Aashray

            • Re: Malformed XML exceptions ( how to handle)
              explorenetapp
              Currently Being Moderated

              Hi Aashray,

                   This is in response to your query on the version on ONTAP running on the machine. The version is v7.3.4. I have recevied the raw xml output for some of the API calls for which I am seeing the Malformed XML exceptions. Following were the list of APIs for which I have the raw xml output.

               

              aggr-list-info

              aggr-space-list-info

              disk-list-info

              nfs-exportfs-list-rules

              volume-list-info

              snapmirror-get-status

               

              I see that raw xml outputs for aggr-list-info, aggr-space-list-info contain incomplete XML. There is an opening tag but no closing tag for an XML element.

               

              Regards,

              Prasanna

              • Re: Malformed XML exceptions ( how to handle)
                aashray
                Currently Being Moderated

                and which SDK version are you using ? So that I can test and find a solution using the same SDK.

                • Re: Malformed XML exceptions ( how to handle)
                  explorenetapp
                  Currently Being Moderated

                  Hi Aashray,

                  The SDK version is 4.0.

                   

                  Prasanna

                  • Re: Malformed XML exceptions ( how to handle)
                    aashray
                    Currently Being Moderated

                    Prasanna, I would recommend you download the SDK available at http://support.netapp.com/NOW/cgi-bin/software and try out the same code. Let me know if that solves the issue.

                    • Re: Malformed XML exceptions ( how to handle)
                      explorenetapp
                      Currently Being Moderated

                      Hi Aashray,

                       

                      Thanks for the response. I shall try out with the SDK that you have pointed me to and let you know the updates.

                       

                      Prasanna

                      • Re: Malformed XML exceptions ( how to handle)
                        aashray
                        Currently Being Moderated

                        Could you share with me your complete code for aggr-list-info that doesn't seem to be working.

                        • Re: Malformed XML exceptions ( how to handle)
                          explorenetapp
                          Currently Being Moderated

                          Hi Aashray,

                          Thanks for the response. I need to check on the possibility of sharing the code. Will get back on this by next week. Also, I have a question w.r.t verifying the execution of the API calls with the newer version that you pointed me to. I understand that the version is SDK 5.0. Can you please let me know if verifying the API call execution with apitest.exe of SDK 5.0 is equivalent to verifying the same with 5.0 version of manageontap.jar in the java code?

                           

                          Regards,

                          Prasanna

                          • Re: Malformed XML exceptions ( how to handle)
                            aashray
                            Currently Being Moderated

                            apitest is a command-line utility to test APIs. This utility is suitable for API users who are at the beginner's level.

                            Verifying with API test should be equivalent to verifying with the manageontap.jar.

                             

                            Your error could be related to passing incorrect XML parameters, that's why I asked for the code. Also, you could test your APIs on ZEDI that comes as a part of the package you have. It generates complete code which should be correct. The code I gave in my first comment is a complete working one from ZEDI. So you could refer that or share your code with me so that we could solve this.

                             

                            -Aashray

                            • Re: Malformed XML exceptions ( how to handle)
                              POOJA_HP_2013
                              Currently Being Moderated

                              Hi Aashray,

                               

                              Thanks for your response.

                              I am  Prasanna's collegue and would be taking this up further. As Prasanna had mentioned, we might not be able to share the exact piece of code, but we do have the apitest and the Z-Explorer outputs. All the APIs were run using 4.0 and 5.0 SDK versions. Please find the same attached with this post.

                               

                              apitest_50_Raw Outputs.zip      - contains API outputs, run using SDK 5.0

                              apitest_Raw_Outputs.zip           - contains API outputs, run using SDK 4.0

                              ZExplorerOutputs.zip                - contains Z-Explorer outputs

                               

                              For most of the APIs, the output seems to be truncated. However, for some of them, though the output seems OK, I observed the following line appended at the end:

                                   "<results reason="debugging bypassed xml parsing" status="failed" errno="13001"/>"

                              Not sure what this could refer to.

                               

                              Any help would be appreciated!

                               

                              Regards,

                              Pooja

                              • Re: Malformed XML exceptions ( how to handle)
                                coon
                                Currently Being Moderated

                                In the raw output (I opened aggr-space-list-info_raw_50.log) it stops at

                                 

                                allocated>2331906048</volume-al

                                 

                                A few suggestions...

                                 

                                1. Can you run a packet trace from data ONTAP (pktt start all -i X.X.X.X -d /vol/volume) where X.X.X.X is the IP address you're running apitest from and volume is replaced with an actual volume on the controller that has space to capture a packet trace.Issue the zapi with HTTP (not https as that complicates using a packet trace). Then stop the packet trace (pktt stop all). You can just use one command and one version that returns this error. Then attach here along with the output the packet trace from the controller. I'm interested to see if this is the same data that left the controller in to the network (by gathering a packet trace).
                                2. I'd also caution against programmatically using just -info API calls if there are -iter & -next APIs for the same information. As the number of all resource types (volumes, aggregates, shares/exports, LUNs, whatever) on a system grows, I've seen complications arise out of only grabbing a large bucket of output with just -info when -iter and -next would be better. For this type of call (using aggr-space-list-info as an example) you'd do better to use the aggr-list-info to build an array of the aggregates and then call aggr-space-list-info for each aggregate. Please let me know if you see individually called aggregates a way to resolve this error.
                                • Re: Malformed XML exceptions ( how to handle)
                                  POOJA_HP_2013
                                  Currently Being Moderated

                                  Thanks coon for the quick response!

                                   

                                  Please find the output of packet trace attached with this post.

                                  I shall check on the usage of the APIs with corresponding -iter and -next APIs available and try to implement it the way you have suggested if not already done so.

                                   

                                  Regards,

                                  Pooja

                                  • Re: Malformed XML exceptions ( how to handle)
                                    coon
                                    Currently Being Moderated

                                    Did you include the output from the apitest command? I'm just looking for a unique string that will help me chase it down in the packet trace.

                                    • Re: Malformed XML exceptions ( how to handle)
                                      coon
                                      Currently Being Moderated

                                      It looks like any replies that hit the MTU size (1514) gets truncated XML. I had a colleague point out that the PSH flag from TCP also tells the recipient to go ahead and process the data, don't wait for more.

                                       

                                      I'll have to defer to some of the developers on how to address that or if it's already addressed somewhere.

                                      • Re: Malformed XML exceptions ( how to handle)
                                        coon
                                        Currently Being Moderated

                                        I filed all the details under bug 728756. The quick solutions would be jumbo frames (9k MTU) or using segmentation like I mentioned before (meaning build an array of the aggregate and then just query each individual aggregate).

                                         

                                        It'll take about a day for this link to work, but you can subscribe to it here: http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=728756

                                         

                                        Note: I just put that out there for tracking purposes. One of the other developers might know if we've already addressed this somewhere. 7.3 code isn't exactly new. ONTAP 8.x has been around a bit now.

                                        • Re: Malformed XML exceptions ( how to handle)
                                          POOJA_HP_2013
                                          Currently Being Moderated

                                          Thanks Coon for the response.

                                           

                                          I cross-checked out code, and we already implement the segmentation approach, as suggested in your previous post. Here is the list of APIs for which we are using -iter and other APIs:

                                          quota-report

                                          qtree-list-

                                          aggr-list-info

                                          perf-object-get-instances

                                          perf-object-instance-list-info

                                          volume-list-info

                                          cifs-share-list

                                           

                                          Regarding use of jumbo frames, could you please be little more descriptive. I am not aware of how to implement this in the code, in order to avoid the truncation of the output. Could you please provide some sample code which explains using jumbo frames?

                                           

                                          Thanks & Regards,

                                          Pooja

                                          • Re: Malformed XML exceptions ( how to handle)
                                            coon
                                            Currently Being Moderated

                                            Jumbo frames would be a network configuration on the NICs of both the storage, the network, and the source issuing the API calls. We're investigating this further, but it seems these problems only happen in the packet trace you sent me when you have a response that is larger than 1 MTU/MSS (network interface PDU size). In your packet trace, Data ONTAP has a 1500 MTU set. When the zapi response is bigger than 1 MTU/MSS, the output is truncated. Jumbo frames (9k MTU) are fairly well known now even if they are technically considered nonstandard MTU. This would increase the amount of data that could be sent back from within Data ONTAP to 9k.

                                             

                                            Changing your network communications channel between wherever you are issuing API calls from and to Data ONTAP may not be easy.

                                             

                                            The easier suggestion would be to write the code to ask for smaller chunks of data. Build the object list within your code and then just query individual object details instead of asking for all of them at once (like in the aggr-list-space-info query that you provided the output for).

                                             

                                            You called aggr-list-space-info with no aggregate value (effectively requesting the entire list). If you issued multiple commands for aggr-list-space-info aggr1 aggr-list-space-info aggr2...etc then the responses I believe could fit within the single MTU and then the problem is worked around without infrastructure changes.

                                            • Re: Malformed XML exceptions ( how to handle)
                                              coon
                                              Currently Being Moderated

                                              If I were programming this, I'd probably also write in to my code a routine that simply takes any result that looks like this:

                                                   "<results reason="debugging bypassed xml parsing" status="failed" errno="13001"/>"

                                               

                                              and return an error for response too large. This would allow me to catch when that happens and identify alternative ways to gather the same information in smaller chunks.

                                               

                                              I'm talking completely outside of the discussion we are having within NetApp about identifying why it is doing that and if/how to address it. This is just discussing working around the issue for now.

                                              • Re: Malformed XML exceptions ( how to handle)
                                                POOJA_HP_2013
                                                Currently Being Moderated

                                                Hi Coon,

                                                 

                                                All the Malformed errors that we have reported, were found by our customers. I was trying to re-produce the Malformed errors with our set-ups, but couldn't. Could you please  let me know if there is any way in which this error can be reproduced? That might help us in debugging the issue better.

                                                 

                                                 

                                                "If I were programming this, I'd probably also write in to my code a routine that simply takes any result that looks like this:

                                                     "<results reason="debugging bypassed xml parsing" status="failed" errno="13001"/>""

                                                 

                                                I observed this in the apitest output for some of the APIs. However, since we have never got malformed xml errors in our set-ups, we are not sure on how this is handled by Ontapi SDK, that we are using in our code.

                                                 

                                                Here's a code snippet of how we use the APIs to get the output (as NaElement):

                                                 

                                                NaElement elemIn = new NaElement(api); //where api is volume-list-info, nfs-exportfs-list-rules etc

                                                //elemIn is added with a new child by invoking elemIn.addNewChild(params1_name, params1_value)

                                                NaServer naServer = new NaServer(<address>,1,0);

                                                naServer.invokeElem(elemIn);

                                                 

                                                I am not sure on how "naServer.invokeElem(elemIn)" would return the output in case it contains "<results reason="debugging bypassed xml parsing" status="failed" errno="13001"/>". If it would be returned as a part of the parsed XML output, we could have a check in our code, as suggested by you.

                                                 

                                                Regards,

                                                Pooja

                                                • Re: Malformed XML exceptions ( how to handle)
                                                  POOJA_HP_2013
                                                  Currently Being Moderated

                                                  Hi Coon,

                                                   

                                                  Any updates on my previous post?

                                                   

                                                  I went through the exceptions again which we have encountered so far in customer's environment, and here is some observation on the same:

                                                  1. Malformed XML error is thrown from NaServer.java class (which is not part of SE code), hence there isn't much from SE side that can be done to handle this if it's due to truncated data. The exception is thrown even before we could get the response back from NaServer:

                                                                 "netapp.manage.NaProtocolException: Malformed XML

                                                                           at netapp.manage.NaServer.invokeElem(NaServer.java:644)"

                                                   

                                                  1. Here are some of the APIs which are throwing exception:

                                                            API: aggr-list-info

                                                       As you had also suggested, we try to get the aggregate names using this API and then iterate through the list using aggr-space-list-info. However, here the exception is thrown while we try to get the aggregate names itself. Any suggestion on how this could be avoided, if it could be?

                                                   

                                                           API: disk-list-info

                                                       Here, we first try to get the disk drives using "disk-list-info" API. When it fails, we try again using CLI, but it throws exception again.

                                                   

                                                            API: volume-list-info

                                                       Here, we first try to get all the volumes using "volume-list-info" API, and if it fails try again with "<volume-list-info-iter-next>" but this also throws the Malformed XML error as shown below:

                                                   

                                                           [2013-03-09 11:43:54 Streamer-17             ] .NetAppNativeMethod(tapp.NetAppPlexProvider) Protocol Exception while processing: <volume-list-info><verbose>true</verbose></volume-list-info>

                                                            [2013-03-09 11:43:54 Streamer-17             ] ntapiDataCollection(tapp.NetAppPlexProvider) NaProtocolException getting plexes for: 0118043593.  Retrying using iterator.

                                                            [2013-03-09 11:44:33 Streamer-17             ] .NetAppNativeMethod(tapp.NetAppPlexProvider) Making ONTAPI call:(10.35.10.36): <volume-list-info-iter-next><maximum>10</maximum><tag>28429880056655277</tag></volume-list-info-iter-next>

                                                            [2013-03-09 11:45:42 Streamer-17             ] .NetAppNativeMethod(tapp.NetAppPlexProvider) Protocol Exception while processing: <volume-list-info-iter-next><maximum>10</maximum><tag>28429880056655277</tag></volume-list-info-iter-next>

                                                            [2013-03-09 11:46:42 Streamer-17             ] .NetAppPlexProvider(tapp.NetAppPlexProvider) Can't enumerate APPIQ_NetAppPlex

                                                  netapp.manage.NaProtocolException: Malformed XML

                                                            at netapp.manage.NaServer.invokeElem(NaServer.java:644)

                                                   

                                                   

                                                           API: snapshot-list-info

                                                   

                                                   

                                                   

                                                   

                                                   

                                                   

                                                   

                                                   

                                                           

                                                  • Re: Malformed XML exceptions ( how to handle)
                                                    coon
                                                    Currently Being Moderated

                                                    Pooja,

                                                     

                                                    We are discussing this issue still. Can you say if there is any consistency in the Data ONTAP versions that encounter this error? I recall seeing somewhere (perhaps the case notes) that this was primarily a 7.3.x issue?

                                                    • Re: Malformed XML exceptions ( how to handle)
                                                      POOJA_HP_2013
                                                      Currently Being Moderated

                                                      Customer encountered these issues for manageontap 4.1 jar. However we tried with 5.0 R1 as well, and still got the errors.

                                                       

                                                      Regards,

                                                      Pooja

                                                      • Re: Malformed XML exceptions ( how to handle)
                                                        coon
                                                        Currently Being Moderated

                                                        Apologies, I was asking if there is any commonality to the Data ONTAP version that is receiving these queries.

                                                        • Re: Malformed XML exceptions ( how to handle)
                                                          POOJA_HP_2013
                                                          Currently Being Moderated

                                                          Hi Coon,

                                                           

                                                          Sorry for the misunderstanding.

                                                          Yes, we have observed this exception only with 7.3.4 as of now, as per the case comments. I can confirm this and let you know by tomorrow.

                                                           

                                                          As I had mentioned earlier, since this is not reproducible in our local set-up, we are not sure whether it is observed for other versions also or not.

                                                           

                                                          Kindly let me know if this answers your query.

                                                           

                                                          Regards,

                                                          Pooja

                                                          • Re: Malformed XML exceptions ( how to handle)
                                                            coon
                                                            Currently Being Moderated

                                                            Please clarify "not reproducible in our local set-up"?

                                                             

                                                            You tried against another 7.3.4 instance of Data ONTAP and did not experience the problem?

                                                            Did it have enough volumes, snapshots, aggregates (whatever was queried) to exceed 1 MTU? In your packet trace if you open it in wireshark and set a filter for XML (this will limit the output to just the packets that are SDK responses) then sort by the length column, you'll see that the 1513 packets are incomplete (select XML in the middle box and the raw output is in the bottom box. With XML highlighted, the bottom box should show you the hex and ASCII of the zapi response being truncated).

                                                             

                                                            Your reproduction attempts would need to make sure you have enough that the response is larger than 1 MTU/ or TCP MSS

                                                            • Re: Malformed XML exceptions ( how to handle)
                                                              POOJA_HP_2013
                                                              Currently Being Moderated

                                                              Hi Coon,

                                                               

                                                              We don't have 7.3.4 set-up available in our labs. However, we did try with other versions (7.3.1, 7.3.6, 8.0 etc) but were not able to reproduce it. Not sure if it could be dependent on the data ONTAP version.

                                                               

                                                              Thanks for providing the steps to open the trace packets in wireshark. I may need to check on the configuration details of our lab set-ups, to figure out whether we have a big enough configuration to generate responses larger then 1MTU. I will let you know the details as soon as I have them.

                                                               

                                                              Please do let me know if there is any other information which is required from our end.

                                                               

                                                              Regards,

                                                              Pooja

                                                              • Re: Malformed XML exceptions ( how to handle)
                                                                coon
                                                                Currently Being Moderated

                                                                I did get a request yesterday from someone I wanted to relay to you. And perhaps some more details on why it's so difficult for you and us to reproduce this.

                                                                 

                                                                Request: Can you run the apitest command again while gathering a packet trace like you did before (to catch an error) but this time simultaneously run a trace on the client running the apitest command. We'd want to be running packet tracing on both sides.

                                                                 

                                                                Explanation: We notice that the responses that have problems generally get caught in a bunch of TCP retransmissions. We'd like to have both sides of that conversation (the client issuing the zapi command and the data ONTAP packet trace) in order to understand better what is causing the TCP retransmissions. I suspect that the reason we're not able to duplicate this other places is that the repro environments we're all trying this in have clean networks (no retransmissions). Perhaps you could have your network team investigate the cause of those retransmissions and if these are the reason you're experiencing this, this might resolve the issue completely.

                                                            • Re: Malformed XML exceptions ( how to handle)
                                                              coon
                                                              Currently Being Moderated

                                                              Going to put our current suggestions/questions on this

                                                               

                                                              1. Collect packet traces from both sides of a failed/truncated API call. Goal: Understand better why there are network retransmissions often in the ones that fail and if those have anything to do with why they fail.
                                                              2. Clarify what you've attempted to reproduce. Exact same hardware? Same network topology? Same API calls that fail?
                                                              3. Clarify what fails in the environment. You have provided here some apitest output that fails with truncated XML. Is that reproducable at will? Does that happen every time you make that API call to that controller? As I understand it you see errors in your production code in your environment, but we're not certain if these XML truncation errors are the same, right?
                                                              4. We'd like to understand if the failures you have observed seem to have any consistency around the actual API calls made. Let me explain why, and I think you'll understand what we're asking. If this was a networking problem of some kind (and please know that when I use this term network it includes the NIC in the storage controller, the driver for that NIC, and all of the software in Data ONTAP that does network processing) or if it's the code within Data ONTAP that creates the XML response itself. If it's a networking code problem, it'll happen for all different APIs and not discriminate on which one you call. If it's the part of code that generates the response, it'll always be the same API calls. I know some API calls are going to generate larger and smaller response, but we'd like to know if you are able to discern any patterns from the failure you've seen. This is also related to #3. If aggr-list-space-info succeeds half of the time and fails half of the time, or if it succeeds 95 times and fails 5 times for the exact same length response, then it is more likely a dependent resource causing the problem (like memory within Data ONTAP or the networking code).

                                                               

                                                              Hopefully this makes clear what we'd need to explain this further.

                                                               

                                                              Thanks!

More Like This

  • Retrieving data ...