13 Replies Latest reply: Aug 13, 2013 10:01 AM by chittur RSS

Flash Accel question

borismekler Sprinter
Currently Being Moderated

Now that version 1.2 is out with support for vSphere 5.1 and VMotion, I'm preparing to deploy it, but before I do, there's one thing that I haven't been able to find an answer to in documentation - how does it handle cache device failures? That is, if I give it just one SSD (rather than RAID1 or RAID10), and that SSD fails, will I simply get performance degradation, or will my cached VMs (or worse, entire host, including non-cached VMs) crash? Same goes for PCIe devices, which can't be RAIDed in the first place.

  • Re: Flash Accel question
    SLASH5K200 Novice
    Currently Being Moderated

    Howdy,

     

    I cant see the benefit of raid1/10 for a read cache, however if the SSD / PCIe was to die your data will still safe, but now your cache will come from flashcache or failing that its back to spindles.

     

    C

    • Re: Flash Accel question
      borismekler Sprinter
      Currently Being Moderated

      I know the reads will come from spindles, my question is, how gracefully is the failure itself handled, when the entire cache device disappears from host, or starts throwing weird errors? Will the VMs bluescreen? Will the host crash? Will either the VMs or the host require a reboot? Can Flash Accel trigger an automatic VMotion to hosts where the cache is still alive? When I replace the faulty device, will I need a reboot to get the cache back online?

      • Re: Flash Accel question
        SLASH5K200 Novice
        Currently Being Moderated

        Thats a good question..

         

        I just kicked off a SQLIO on a test VM with flash accel cache enabled and after a few minutes removed the datastore that holds our RDMp's for Flash Accel on the host where this VM lives to simulate the loss of a cache device...

        Screen Shot 2013-07-28 at 10.16.33 PM.png

        VM stayed alive running the SQLIO test and flashcache start to kick in... VM was fine

         

        Flash Accel hit Ratio %

        Screen Shot 2013-07-28 at 10.10.56 PM.png

        Flash Cache:

        Screen Shot 2013-07-28 at 10.33.13 PM.png

         

        When i look at the Flash Accel homepage it now also says:

         

        Screen Shot 2013-07-28 at 10.29.36 PM.png

         

        To repair this, i migrated the VM to another host that had access to the existing RDMp datastore, disabled the cache for the VM, enabled the cache, and now its working again... no restart required!

         

        If you ask me... thats pretty cool!

         

        hope that helps you

         

        C

  • Re: Flash Accel question
    liviu.ianasi Novice
    Currently Being Moderated

    Hi everyone,

    I'm also doing tests with FlashAccel and followed your example with the RDMp datastore fail test. When i offline the LUN/datastore my test VM shuts down - vmware HA kicks in and tries to restart the vm on another host.

    • Re: Flash Accel question
      SLASH5K200 Novice
      Currently Being Moderated

      Interesting...

       

      My environment:

      • Cisco UCS
        • B200M3 Blade, 256G of Ram, LSI 400GB SLC WarpDrive
        • ESXi 5.0 - current patchset
          • Windows 2008R2 - all current patches including aditional patches required for MPIO, snapdrive, etc.

       

      • NetApp 3240AE
        • Clustered-Ontap 8.1.3
        • FlashCache
        • SAS

       

      I presented the datastore to each ESXi host via iscsi and I did my test against an iSCSI LUN presented within a windows host configured with clustered file services (this was a test HA SQL environment)

       

      I'm not in a position to do any re-testing against the operating system drive hosted on a VMDK - which may be the difference here ?

       

      Cheers,

       

      Chris

       

      Message was edited by: Chris Anders Added LSI Card to B200M3 spec.

      • Re: Flash Accel question
        borismekler Sprinter
        Currently Being Moderated

        Wait one - I was under the impression that current version of Flash Accel doesn't support MSCS. I have a few environments similar to what you tested (Windows Server 2008 R2 on top of vSphere with in-guest iSCSI LUNs used for SQL Server 2008 R2 on MSCS) which could benefit from Flash Accel (the filers are FAS2040/2220/2240, so no option of FlashCache), but when I asked whether or not MSCS is supported in a recent NetApp/LSI webcast about Flash Accel, I was told that it's not supported in 1.2, and may be added in 1.3. Was that incorrect?

        • Re: Flash Accel question
          SLASH5K200 Novice
          Currently Being Moderated

          Interesting...

           

          so from the flash accel gui i was able to see on both hosts the mapped lun's however only one of the hosts had the luns mounted and was writing to it.

           

          Active Node:

          Screen Shot 2013-08-06 at 6.16.56 PM.png

          Screen Shot 2013-08-06 at 6.17.40 PM.png

           

          Passive Node:

          Screen Shot 2013-08-06 at 6.16.50 PM.png

          Screen Shot 2013-08-06 at 6.17.18 PM.png

           

           

          10G of cache was given to both hosts and migration was enabled, which meant i burnt 20G of cache on both blades. - i had each SQL host on separate blades.

           

          I did some simple testing whereby i ran some IO and watch the cache do its job, i then failed over to the other node, re ran some tests and watch the second cache do its job.

          (the screenshots above dont represent that test - just pulled them now and the server has since been restarted)

           

          Cache was cold as i migrated between SQL hosts, but that was to be expected and to be honest i didnt even check if this configuration was supported, i just tested it since 1.2 supported iscsi within the host and to my surprise it did the job!

           

          *shrug*

           

          Im not saying its "supported" but it certainly passed the - wow this is cool... lets try this in UAT!

           

          Cheers,

           

          Chris

      • Re: Flash Accel question
        liviu.ianasi Novice
        Currently Being Moderated

        My environment looks like this:

        Dell R720

        •      ESXi 5.1 latest patches
          • Fusionio iodrive2 - 750GB - scsi driver for esx 5.1 latest

        Netapp 2240AA

        • NFS exports for vmware datastores

         

        Test Machine

        •       Windows 2008 R2 x64 - no MPIO or any special apps installed. Just IOmeter for testing.

        Only one ESXi host is involved in the test but is part of a cluster configured with HA and DRS.

        To make FlashAccel work i presented a iscsi LUN to the ESXi to store the pRDM file. All other vm disks are on NFS datastore.

         

        All working ok until I offline the lun presented with iscsi. At that moment the ESXi throws an error that it cannot find the raw disk, and shuts down the vm to restart it on another host. No other host has (for the moment) the iscsi datastore so it remains powered off.

        I think it's expected behavior from ESX HA to try and restart the VM to another host in the cluster when it looses connection to the LUN but that is not the way I'd wish it should react.

         

        Anyway loosing the iscsi datastore is not a viable scenario as the netapp is AA so no problems here to make iscsi redundant. I will do some more test but this time will actually fail the fusionio card to see the result.

        • Re: Flash Accel question
          SLASH5K200 Novice
          Currently Being Moderated

          So the main difference I see apart from ESXi version is that your making use of the FusionIO card and im making use of the LSI card which i believe is presented to the ESXi host differently.

           

          Not sure how else you can simulate the card failure without physically pulling it out, but interested to see how you go

           

          Cheers,

           

          Chris

        • Re: Flash Accel question
          SLASH5K200 Novice
          Currently Being Moderated

          In response to your comment:

          "I think it's expected behavior from ESX HA to try and restart the VM to another host in the cluster when it looses connection to the LUN but that is not the way I'd wish it should react."

           

          That seems a tad odd to me, considering i have seen  datastores go missing on me numerous times, especially when demonstrating NFS failover on 7-mode installs to customers and instead of the VM dying, it will pause while trying to resolve the missing datastore.

           

          interesting...

          • Re: Flash Accel question
            liviu.ianasi Novice
            Currently Being Moderated

            The vm will pause while esx tries to restore the NFS datastore for a given amount of time - if you use Netapp VSC those setting are done by VSC - NFS timeout - but eventualy it will HA to a different host.

            For the Fusionio i've used a dedicated driver so i think i'm ok in that part.

             

            Did you enable cache on the OS disk also or only on the disk presented as iSCSI LUN directly in the VM (windows iscsi software initiator) ?

      • Re: Flash Accel question
        chittur NetApp Employee Sprinter
        Currently Being Moderated

        Chris,

         

        Have you deployed Flash Accel in Production environment?

         

        Thanks!

        Kumar

        Flash Accel TME.

More Like This

  • Retrieving data ...

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points