2 Replies Latest reply: Nov 29, 2013 6:47 AM by Calvin Scoltock RSS

Snapdrive removing LUN before removing RDM from VM

Calvin Scoltock
Currently Being Moderated

I have had an issue a couple of times when a backup has run, snapdrive creates a clone of the LUNs to be backed up attaches them to the VM as RDM disks, once the backup has completed at least one of the LUNs has been destroyed BEFORE the VM  has been updated to remove the RDM.

 

Both times this has been on a Windows 2008 R2 Server running Microsoft SQL 2008 R2.  The server is a VM running on vSphere 5.1 with the disks Database and Logs Disks attached as RDMs from a NetApp Filer running Data ONTAP 8.1.2P4 7-Mode.

 

The last time this happened it was a backup triggered from SCCM 2007.  SCCM is installed on another server but configured to backup its configuration and SQL database at 2am each morning.

 

I can see in the SCCM backup log where it attampts to backup the SCCM database

 

Component 6 - PAGBackup\SiteDBServer\SMSbkSQLDBsite.dat. SMS_SITE_BACKUP 19/11/2013 02:00:32 6704 (0x1A30)

Dependency On SQL Writer , LogicalPath:\\WINSQL10, ComponentName:SMS_PAG SMS_SITE_BACKUP 19/11/2013 02:00:32 6704 (0x1A30)

After GatherWriterMetadata SMS Writer status = STABLE. SMS_SITE_BACKUP 19/11/2013 02:00:32 6704 (0x1A30)

Starting to clean and prepare the backup location. SMS_SITE_BACKUP 19/11/2013 02:00:32 6704 (0x1A30)

Cleanup and/or preparation done. SMS_SITE_BACKUP 19/11/2013 02:00:35 6704 (0x1A30)

Info: Sending message to start the SQL Backup... SMS_SITE_BACKUP 19/11/2013 02:00:35 6704 (0x1A30)

Starting to create the snapshot set. SMS_SITE_BACKUP 19/11/2013 02:00:35 6704 (0x1A30)

Added volume containing C:\ to the snapshot set. SMS_SITE_BACKUP 19/11/2013 02:00:36 6704 (0x1A30)

Info: Starting Asynchronous PrepareForBackup... SMS_SITE_BACKUP 19/11/2013 02:00:36 6704 (0x1A30)

Info: Asynchronous PrepareForBackup finished... SMS_SITE_BACKUP 19/11/2013 02:00:36 6704 (0x1A30)

Info: Waiting for SQL to PrepareForBackup. SMS_SITE_BACKUP 19/11/2013 02:00:36 6704 (0x1A30)

Sleeping for 5 seconds... SMS_SITE_BACKUP 19/11/2013 02:00:36 6704 (0x1A30)

Sleeping for 5 seconds... SMS_SITE_BACKUP 19/11/2013 02:00:41 6704 (0x1A30)

Sleeping for 5 seconds... SMS_SITE_BACKUP 19/11/2013 02:00:46 6704 (0x1A30)

Sleeping for 5 seconds... SMS_SITE_BACKUP 19/11/2013 02:00:51 6704 (0x1A30)

Info: SQL's PrepareForBackup is finished. SMS_SITE_BACKUP 19/11/2013 02:00:56 6704 (0x1A30)

After PrepareForBackup SMS Writer status = STABLE. SMS_SITE_BACKUP 19/11/2013 02:00:56 6704 (0x1A30)

Info: Starting Asynchronous DoSnapshotSet... SMS_SITE_BACKUP 19/11/2013 02:00:56 6704 (0x1A30)

Info: Sending message to SQL Backup to DoSnapshot... SMS_SITE_BACKUP 19/11/2013 02:00:56 6704 (0x1A30)

Info: Asynchronous DoSnapshotSet finished. SMS_SITE_BACKUP 19/11/2013 02:03:40 6704 (0x1A30)

After DoSnapshotSet SMS Writer status = WAITING_FOR_BACKUP_COMPLETION. SMS_SITE_BACKUP 19/11/2013 02:03:40 6704 (0x1A30)

Starting the backup complete phase. SMS_SITE_BACKUP 19/11/2013 02:03:40 6704 (0x1A30)

STATMSG: ID=5056 SEV=I LEV=M SOURCE="SMS Server" COMP="SMS_SITE_BACKUP" SYS=WINMSS01 SITE=PAG PID=4072 TID=6704 GMTDATE=Tue Nov 19 02:03:40.726 2013 ISTR0="" ISTR1="" ISTR2="" ISTR3="" ISTR4="" ISTR5="" ISTR6="" ISTR7="" ISTR8="" ISTR9="" NUMATTRS=0 SMS_SITE_BACKUP 19/11/2013 02:03:40 6704 (0x1A30)

LogEvent(): Successfully logged Event to NT Event Log. (4 - 48 - 1,073,746,880) SMS_SITE_BACKUP 19/11/2013 02:03:40 6704 (0x1A30)

Backup Succeeded  for Component - PAGBackup\SiteServer\SMSServer\inboxes. SMS_SITE_BACKUP 19/11/2013 02:04:33 6704 (0x1A30)

Backup Succeeded  for Component - PAGBackup\SiteServer\SMSServer\Logs. SMS_SITE_BACKUP 19/11/2013 02:04:57 6704 (0x1A30)

Backup Succeeded  for Component - PAGBackup\SiteServer\SMSServer\data. SMS_SITE_BACKUP 19/11/2013 02:04:58 6704 (0x1A30)

Backup Succeeded  for Component - PAGBackup\SiteServer\SMSServer\srvacct. SMS_SITE_BACKUP 19/11/2013 02:04:58 6704 (0x1A30)

Backup Succeeded  for Component - PAGBackup\SiteServer\SMSbkSiteRegNAL.dat. SMS_SITE_BACKUP 19/11/2013 02:04:58 6704 (0x1A30)

Backup Succeeded  for Component - PAGBackup\SiteServer\SMSbkSiteRegSMS.dat. SMS_SITE_BACKUP 19/11/2013 02:04:58 6704 (0x1A30)

Waiting On SQL Backup task to complete... SMS_SITE_BACKUP 19/11/2013 02:04:58 6704 (0x1A30)

STATMSG: ID=5050 SEV=E LEV=M SOURCE="SMS Server" COMP="SMS_SITE_BACKUP" SYS=WINMSS01 SITE=PAG PID=4072 TID=6704 GMTDATE=Tue Nov 19 02:05:41.518 2013 ISTR0="" ISTR1="" ISTR2="" ISTR3="" ISTR4="" ISTR5="" ISTR6="" ISTR7="" ISTR8="" ISTR9="" NUMATTRS=0 SMS_SITE_BACKUP 19/11/2013 02:05:41 6704 (0x1A30)

Error: SQL Backup failed... SMS_SITE_BACKUP 19/11/2013 02:05:41 6704 (0x1A30)

Error: Backup Failed  for Component - PAGBackup\SiteDBServer\SMSbkSQLDBsite.dat. SMS_SITE_BACKUP 19/11/2013 02:05:41 6704 (0x1A30)

 

I can see in the Windows Event Log on the SQL Server that, a snapshot is taken of the DB and Logs LUNs and a clone of the LUNs from these Snapshots is created.

 

ONTAP VSS hardware provider service has started.

Data ONTAP VSS hardware provider is loaded.

Data ONTAP VSS hardware provider is adding a source lun [VendorId=NETAPP, ProductId=LUN, SerialNo=W-Oph4Xi5j8Z] to SnapshotSetId {6159fa30-be36-4f8d-b358-234fbd44308d}.

Data ONTAP VSS hardware provider is adding a source lun [VendorId=NETAPP, ProductId=LUN, SerialNo=W-Oph4Xi5nFb] to SnapshotSetId {6159fa30-be36-4f8d-b358-234fbd44308d}.

SnapDrive is ready to create Snapshot copy ({6159fa30-be36-4f8d-b358-234fbd44308d}) of LUN(s).

I/O is frozen on database SMS_PAG. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

Snapshot ({6159fa30-be36-4f8d-b358-234fbd44308d}) of LUN(s) on storage system (NSERIES02) volume (WinSql10_DBs|WinSql10_DBs_LOGS|) was successfully created.

Data ONTAP VSS hardware provider has successfully completed CommitSnapshots for SnapshotSetId {6159fa30-be36-4f8d-b358-234fbd44308d} in 405 milliseconds.

I/O was resumed on database SMS_PAG. No user action is required.

GetTargetLunInfo succeeded. Storage System Name = NSERIES02. lunPath = /vol/WinSql10_DBs/WinSql10_DBs.lun. Snapshot copy Name = {6159fa30-be36-4f8d-b358-234fbd44308d}

GetTargetLunInfo succeeded. Storage System Name = NSERIES02. lunPath = /vol/WinSql10_DBs_LOGS/WinSql10_DBs_LOGS.lun. Snapshot copy Name = {6159fa30-be36-4f8d-b358-234fbd44308d}

CreateTargetLun succeeded. Storage System Name = NSERIES02. lunPath = /vol/WinSql10_DBs/WinSql10_DBs.lun. Snapshot copy Name = {6159fa30-be36-4f8d-b358-234fbd44308d}

Data ONTAP VSS hardware provider has successfully mapped a lun [VendorId=NETAPP, ProductId=LUN, SerialNo=2Fh83$BzP8Os].

CreateTargetLun failed. Storage System Name = NSERIES02. lunPath = /vol/WinSql10_DBs_LOGS/WinSql10_DBs_LOGS.lun. Snapshot copy Name = {6159fa30-be36-4f8d-b358-234fbd44308d}. Error code = 0xc0040414. Error description = Failed to delete disk in virtual machine, The parameter is incorrect.

Data ONTAP VSS hardware provider failed in LocateLuns method. SnapDrive failed to map the target lun(s). Please check SnapDrive event logs. The data is the error code.

 

The VM is then paused with the following message

 

The storage backing virtual disk /vmfs/volumes/51ec1b77-f5707fd8-b5f2-e41f132eb8cc/WinSql10/WinSql10_SD_NSERIES02_2Fh83_BzP80u_20.vmdk has permanent device loss.  You may be able to hot remove this virtual device from teh virtual machine and continue after clicking Retry.  Click Cancel to terminate this session.

 

If I then remove the RDM from the VM I can then click Retry and the VM continues to run.  The LUN clone has already disappeared from the NetApp Filer.

 

In the VM Log files I can see where the LUN Clones are added to the VM and where one of the clones is removed and the VM hangs because the other clone no longer exists.

 

2013-11-19T02:02:01.165Z| vmx| I120: TOOLS received request in VMX to set option 'synctime' -> '0'

2013-11-19T02:02:01.256Z| vmx| I120: VMXVmdb_SetCfgState: cfgReqPath=/vm/#_VMX/vmx/cfgState/req/#e/, remDevPath=/vm/#_VMX/vmx/vigor/setCfgStateReq/#dcfb/in/

2013-11-19T02:02:01.567Z| vmx| I120: VMAutomation: Hot add device. type=50, backing=101

2013-11-19T02:02:01.567Z| vmx| I120: HotAdd: Adding scsi-hardDisk with mode 'independent-persistent' to scsi0:9

2013-11-19T02:02:01.567Z| vmx| I120: DISK: OPEN scsi0:9 '/vmfs/volumes/51ec1b77-f5707fd8-b5f2-e41f132eb8cc/WinSql10/WinSql10_SD_NSERIES02_2Fh83_BzP8Os_19.vmdk' independent-persistent R[]

2013-11-19T02:02:01.573Z| vmx| I120: DISKLIB-VMFS  : "/vmfs/volumes/51ec1b77-f5707fd8-b5f2-e41f132eb8cc/WinSql10/WinSql10_SD_NSERIES02_2Fh83_BzP8Os_19-rdmp.vmdk" : open successful (10) size = 139591226880, hd = 3522193. Type 10

2013-11-19T02:02:01.573Z| vmx| I120: DISKLIB-DSCPTR: Opened [0]: "WinSql10_SD_NSERIES02_2Fh83_BzP8Os_19-rdmp.vmdk" (0xa)

 

 

2013-11-19T02:02:01.573Z| vmx| I120: DISKLIB-LINK  : Opened '/vmfs/volumes/51ec1b77-f5707fd8-b5f2-e41f132eb8cc/WinSql10/WinSql10_SD_NSERIES02_2Fh83_BzP8Os_19.vmdk' (0xa): vmfsPassthroughRawDeviceMap, 272639115 sectors / 130 GB.

2013-11-19T02:02:01.580Z| vmx| I120: DISKLIB-LIB   : Opened "/vmfs/volumes/51ec1b77-f5707fd8-b5f2-e41f132eb8cc/WinSql10/WinSql10_SD_NSERIES02_2Fh83_BzP8Os_19.vmdk" (flags 0xa, type vmfsPassthroughRawDeviceMap).

2013-11-19T02:02:01.581Z| vmx| I120: DISK: Disk '/vmfs/volumes/51ec1b77-f5707fd8-b5f2-e41f132eb8cc/WinSql10/WinSql10_SD_NSERIES02_2Fh83_BzP8Os_19.vmdk' has UUID '60 00 c2 9e 03 d9 43 fb-ea 1c c6 f4 5d b3 16 65'

2013-11-19T02:02:01.581Z| vmx| I120: DISK: OPEN '/vmfs/volumes/51ec1b77-f5707fd8-b5f2-e41f132eb8cc/WinSql10/WinSql10_SD_NSERIES02_2Fh83_BzP8Os_19.vmdk' Geo (16971/255/63) BIOS Geo (0/0/0)

2013-11-19T02:02:01.589Z| vmx| I120: SCSI DEVICE (scsi0:9): Computed value of scsi0:9.useBounceBuffers: default

2013-11-19T02:02:01.589Z| vmx| I120: Creating virtual dev for scsi0:9

2013-11-19T02:02:01.590Z| vmx| I120: DumpDiskInfo: scsi0:9 createType=17, capacity = 272639115, numLinks = 1, deviceName = 'vml.020013000060a98000324668383324427a50384f734c554e202020', allocationType = 0

2013-11-19T02:02:01.591Z| vmx| I120: SCSIDiskESXPopulateVDevDesc: Using RDMP backend

2013-11-19T02:02:01.591Z| vmx| I120: DISKUTIL: scsi0:9 : geometry=16971/255/63

2013-11-19T02:02:01.609Z| vcpu-0| I120: LSI:Event notification sent for SAS device scsi0:9...

2013-11-19T02:02:45.645Z| vmx| I120: TOOLS received request in VMX to set option 'synctime' -> '0'

2013-11-19T02:02:45.664Z| vmx| I120: VMXVmdb_SetCfgState: cfgReqPath=/vm/#_VMX/vmx/cfgState/req/#f/, remDevPath=/vm/#_VMX/vmx/vigor/setCfgStateReq/#dd15/in/

2013-11-19T02:02:46.030Z| vmx| I120: VMAutomation: Hot add device. type=50, backing=101

2013-11-19T02:02:46.030Z| vmx| I120: HotAdd: Adding scsi-hardDisk with mode 'independent-persistent' to scsi0:10

2013-11-19T02:02:46.030Z| vmx| I120: DISK: OPEN scsi0:10 '/vmfs/volumes/51ec1b77-f5707fd8-b5f2-e41f132eb8cc/WinSql10/WinSql10_SD_NSERIES02_2Fh83_BzP8Ou_20.vmdk' independent-persistent R[]

2013-11-19T02:02:46.036Z| vmx| I120: DISKLIB-VMFS  : "/vmfs/volumes/51ec1b77-f5707fd8-b5f2-e41f132eb8cc/WinSql10/WinSql10_SD_NSERIES02_2Fh83_BzP8Ou_20-rdmp.vmdk" : open successful (10) size = 44038149120, hd = 4210305. Type 10

2013-11-19T02:02:46.036Z| vmx| I120: DISKLIB-DSCPTR: Opened [0]: "WinSql10_SD_NSERIES02_2Fh83_BzP8Ou_20-rdmp.vmdk" (0xa)

2013-11-19T02:02:46.036Z| vmx| I120: DISKLIB-LINK  : Opened '/vmfs/volumes/51ec1b77-f5707fd8-b5f2-e41f132eb8cc/WinSql10/WinSql10_SD_NSERIES02_2Fh83_BzP8Ou_20.vmdk' (0xa): vmfsPassthroughRawDeviceMap, 86012010 sectors / 41.0 GB.

2013-11-19T02:02:46.037Z| vmx| I120: DISKLIB-LIB   : Opened "/vmfs/volumes/51ec1b77-f5707fd8-b5f2-e41f132eb8cc/WinSql10/WinSql10_SD_NSERIES02_2Fh83_BzP8Ou_20.vmdk" (flags 0xa, type vmfsPassthroughRawDeviceMap).

2013-11-19T02:02:46.039Z| vmx| I120: DISK: Disk '/vmfs/volumes/51ec1b77-f5707fd8-b5f2-e41f132eb8cc/WinSql10/WinSql10_SD_NSERIES02_2Fh83_BzP8Ou_20.vmdk' has UUID '60 00 c2 95 b6 92 2b ba-2f 15 c5 49 34 05 02 0f'

2013-11-19T02:02:46.039Z| vmx| I120: DISK: OPEN '/vmfs/volumes/51ec1b77-f5707fd8-b5f2-e41f132eb8cc/WinSql10/WinSql10_SD_NSERIES02_2Fh83_BzP8Ou_20.vmdk' Geo (5354/255/63) BIOS Geo (0/0/0)

2013-11-19T02:02:46.044Z| vmx| I120: SCSI DEVICE (scsi0:10): Computed value of scsi0:10.useBounceBuffers: default

2013-11-19T02:02:46.044Z| vmx| I120: Creating virtual dev for scsi0:10

2013-11-19T02:02:46.045Z| vmx| I120: DumpDiskInfo: scsi0:10 createType=17, capacity = 86012010, numLinks = 1, deviceName = 'vml.020014000060a98000324668383324427a50384f754c554e202020', allocationType = 0

2013-11-19T02:02:46.046Z| vmx| I120: SCSIDiskESXPopulateVDevDesc: Using RDMP backend

2013-11-19T02:02:46.046Z| vmx| I120: DISKUTIL: scsi0:10 : geometry=5354/255/63

2013-11-19T02:02:46.066Z| vcpu-0| I120: LSI:Event notification sent for SAS device scsi0:10...

2013-11-19T02:03:01.611Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox-dnd timed out.

2013-11-19T02:03:12.289Z| vmx| I120: TOOLS received request in VMX to set option 'synctime' -> '0'

2013-11-19T02:03:12.307Z| vmx| I120: VMXVmdb_SetCfgState: cfgReqPath=/vm/#_VMX/vmx/cfgState/req/#10/, remDevPath=/vm/#_VMX/vmx/vigor/setCfgStateReq/#dd28/in/

2013-11-19T02:03:12.370Z| vmx| I120: VMAutomation: Hot remove device. type=50, idx=9

2013-11-19T02:03:12.371Z| vcpu-0| I120: LSI:Event notification sent for SAS device scsi0:9...

2013-11-19T02:03:12.371Z| vcpu-0| I120: Destroying virtual dev for scsi0:9 vscsi=8478

2013-11-19T02:03:12.379Z| vcpu-0| I120: scsi0:9: numIOs = 0 numMergedIOs = 0 numSplitIOs = 0 ( 0.0%)

2013-11-19T02:03:12.379Z| vcpu-0| I120: Closing disk scsi0:9

2013-11-19T02:03:12.380Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/51ec1b77-f5707fd8-b5f2-e41f132eb8cc/WinSql10/WinSql10_SD_NSERIES02_2Fh83_BzP8Os_19-rdmp.vmdk" : closed.

2013-11-19T02:03:12.409Z| vmx| I120: Msg_Question:

2013-11-19T02:03:12.409Z| vmx| I120: [msg.hbacommon.askonpermanentdeviceloss] The storage backing virtual disk /vmfs/volumes/51ec1b77-f5707fd8-b5f2-e41f132eb8cc/WinSql10/WinSql10_SD_NSERIES02_2Fh83_BzP8Ou_20.vmdk has permanent device loss. You may be able to hot remove this virtual device from the virtual machine and continue after clicking Retry. Click Cancel to terminate this session.

2013-11-19T02:03:12.409Z| vmx| I120: ----------------------------------------

 

In the NetApp logs I can see where the

 

Tue Nov 19 02:01:09 GMT [NSERIES02: SMBRPCWorker04:notice]: Multicreation of snapshot named {6159fa30-be36-4f8d-b358-234fbd44308d} successful. It took 3 milli seconds from start to finish in ZAPI.

Tue Nov 19 02:01:10 GMT [NSERIES02:lun.offline:warning]: LUN /vol/WinSql10_DBs/{de2baaa3-64bd-4395-9ab8-1f2870d5a18b}.aux has been taken offline 

Tue Nov 19 02:01:10 GMT [NSERIES02:lun.destroy:info]: LUN /vol/WinSql10_DBs/{de2baaa3-64bd-4395-9ab8-1f2870d5a18b}.aux destroyed 

Tue Nov 19 02:01:10 GMT [NSERIES02:lun.offline:warning]: LUN /vol/WinSql10_DBs_LOGS/{0e5d11bb-4a82-4692-a138-171aef2f8959}.aux has been taken offline 

Tue Nov 19 02:01:11 GMT [NSERIES02:lun.destroy:info]: LUN /vol/WinSql10_DBs_LOGS/{0e5d11bb-4a82-4692-a138-171aef2f8959}.aux destroyed 

Tue Nov 19 02:01:12 GMT [NSERIES02:lun.offline:warning]: LUN /vol/WinSql10_DBs/{de2baaa3-64bd-4395-9ab8-1f2870d5a18b}.rws has been taken offline 

Tue Nov 19 02:01:12 GMT [NSERIES02:lun.map:info]: LUN /vol/WinSql10_DBs/{de2baaa3-64bd-4395-9ab8-1f2870d5a18b}.rws was mapped to initiator group viaRPC.iqn.1998-01.com.vmware:esxi-a-05-032858dd=19 

Tue Nov 19 02:02:06 GMT [NSERIES02:lun.offline:warning]: LUN /vol/WinSql10_DBs_LOGS/{0e5d11bb-4a82-4692-a138-171aef2f8959}.rws has been taken offline 

Tue Nov 19 02:02:06 GMT [NSERIES02:lun.map:info]: LUN /vol/WinSql10_DBs_LOGS/{0e5d11bb-4a82-4692-a138-171aef2f8959}.rws was mapped to initiator group viaRPC.iqn.1998-01.com.vmware:esxi-a-05-032858dd=20 

Tue Nov 19 02:02:59 GMT [NSERIES02:lun.map.unmap:info]: LUN /vol/WinSql10_DBs_LOGS/{0e5d11bb-4a82-4692-a138-171aef2f8959}.rws unmapped from initiator group viaRPC.iqn.1998-01.com.vmware:esxi-a-05-032858dd 

Tue Nov 19 02:03:00 GMT [NSERIES02:lun.destroy:info]: LUN /vol/WinSql10_DBs_LOGS/{0e5d11bb-4a82-4692-a138-171aef2f8959}.rws destroyed

Tue Nov 19 02:56:31 GMT [NSERIES02:lun.map.unmap:info]: LUN /vol/WinSql10_DBs/{de2baaa3-64bd-4395-9ab8-1f2870d5a18b}.rws unmapped from initiator group viaRPC.iqn.1998-01.com.vmware:esxi-a-05-032858dd 

Tue Nov 19 02:56:42 GMT [NSERIES02:lun.destroy:info]: LUN /vol/WinSql10_DBs/{de2baaa3-64bd-4395-9ab8-1f2870d5a18b}.rws destroyed

 

2:56 was the time the RDMs were manually removed from the VM and the Retry button was clicked.

 

Has anyone else experience similar issues?  It looks to me that the clone of the DB Logs LUN was destroyed before it was removed from the VM.

  • Re: Snapdrive removing LUN before removing RDM from VM
    GRAEMEOGDEN
    Currently Being Moderated

    Hi Calvin,

     

    What version of Snapdrive are you using? I've had similar issues in the past that were resolved by updating the version of snapdrive on the servers (see: https://communities.netapp.com/thread/19192)

     

    Please check the IMT to make sure later versions are compatible with your version of DataONTAP and ESXi.

     

    Thanks

    Graeme

    • Re: Snapdrive removing LUN before removing RDM from VM
      Calvin Scoltock
      Currently Being Moderated

      Thanks for the reply Graeme.

       

      I am using SnapDrive for Windows 6.5.

       

      However, I think the issue you are seeing is slightly different to mine and is more that SnapDrive does not do a "clean" removal of the LUNs from the ESXi hosts and the hosts eventually become disconnected from vCenter.  This is a long standing issue and I blogged about this back in 2011 (http://pelicanohintsandtips.wordpress.com/2011/11/27/esxi-hosts-becoming-disconnected-from-vcenter/).  I can't remember now which version of ESXi 4 fixed the issue, but I think it was re-introduced in a later version of ESXi.

       

      The more I work with SnapDrive the more I find issues with the way it removes LUNs.  I keep meaning to put another article together about the method it uses and the way VMware recommend it should be done and see if we can get NetApp enginerring to sort it out.  I use a PowerShell script at the end of my SnapManager jobs that perform a verify to rescan the HBAs on all of the ESXi hosts once the job has completed to clean up the dead paths.  However, this does not resolve the issue that i get loads and loads of warnings in the VMkernel log; makes it hard to identify real issues - I generate a report of all errors and warning in the VMKernel log each day.

More Like This

  • Retrieving data ...