Losing Access to Cluster Shared Volumes: Cluster Shared Volume 'Volume1' ('CSV Disk1') has entered a paused state because of '(c0000435)'

Hi,

Just built a Server 2012 R2 Hyper-V failover cluster connected to Equallogic 4110 storage arrays with latest firmware and HIT kits. 
When creating a clone or vm from a template we see that the cluster loses access to the storage csv volume that is hosted on the equallogic storage with the following errors:

Cluster Shared Volume 'Volume1' ('CSV Disk1') has entered a paused state because of '(c0000435)'. All I/O will temporarily be queued until a path to the volume is reestablished.

Can anyone shed any light onto this issue?

Full details below:

Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 06/08/2014 09:31:17
Event ID: 5120
Task Category: Cluster Shared Volume
Level: Error
Keywords: 
User: SYSTEM
Computer: SVR1
Description:
Cluster Shared Volume 'Volume1' ('CSV Disk1') has entered a paused state because of '(c0000435)'. All I/O will temporarily be queued until a path to the volume is reestablished.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-FailoverClustering" Guid="{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}" />
<EventID>5120</EventID>
<Version>0</Version>
<Level>2</Level>
<Task>38</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime="2014-08-06T08:31:17.330643100Z" />
<EventRecordID>36230</EventRecordID>
<Correlation />
<Execution ProcessID="2336" ThreadID="3524" />
<Channel>System</Channel>
<Computer>SVR1</Computer>
<Security UserID="S-1-5-18" />
</System>
<EventData>
<Data Name="VolumeName">Volume1</Data>
<Data Name="ResourceName">CSV Disk1</Data>
<Data Name="ErrorCode">(c0000435)</Data>
</EventData>
</Eve

August 12th, 2014 3:05pm

The cluster is losing access to the disk. I would open a ticket with Dell..

also check this thread http://social.technet.microsoft.com/Forums/lync/en-US/ed47b837-6495-4419-8eb2-12aa2865f409/storage-issues-during-live-clone-in-server-2012-r2?forum=winserverhyperv

Free Windows Admin Tool Kit Click here and download it now
August 13th, 2014 1:45am

Hi rEMOTE_eVENT,

Could you tell us how you clone a vm When creating a clone or vm from a template , did your cluster can pass the cluster validation test, the copied vm have the same  BIOSGUID information and etc. Please try to use the general installed system to install failover cluster.

More information:

How to use uniquely identify a virtual machine in Hyper-V

http://blogs.technet.com/b/jhoward/archive/2008/09/16/how-to-use-uniquely-identify-a-virtual-machine-in-hyper-v.aspx

The similar thread:

How to Clone VMs in Hyper-V

http://social.technet.microsoft.com/Forums/windowsserver/en-US/67c4c555-14fd-4164-bf5b-59ce883c8b18/how-to-clone-vms-in-hyperv?forum=winserverhyperv

Im glad to be of help to you!

August 14th, 2014 7:19am

Hi,

I use SCVMM 2012 R2 to clone a VM. Right Click the VM > Clone. I also use SCVMM to create a new VM for the template.

Free Windows Admin Tool Kit Click here and download it now
August 14th, 2014 9:39am

Hi rEMOTE_eVENT,

Did your template image have install the recommended hotfixes and updates for Windows Server 2012 R2-based failover clusters updates? If not please install them before you create the failover cluster.

The update download URL:

Recommended hotfixes and updates for Windows Server 2012 R2-based failover clusters

http://support.microsoft.com/kb/2920151

Im glad to be of help to you!

August 15th, 2014 1:35am

Hi,

Just an update to my previous post that unfortunately removing the Symantec did not fix the issue. This problem is logged with Microsoft as a bug and I am awaiting a hotfix.

T

Free Windows Admin Tool Kit Click here and download it now
August 19th, 2014 9:28am

Hi,

I am having the exact same issue. Do we know if this is a real bug and when the hotfix will be released?

Thanking you in advance.

October 21st, 2014 10:17am

I seem to be experiencing this issue as well. 

To echo Jools_SP's comment, Microsoft has confirmed this is a bug and is working on a hotfix?  Did they give you any idea of when it will be available?

FWIW, I'm dealing with a Hyper-V 2012 R2 cluster using an Equallogic for the CSV and cloning the VM from SCVMM 2012 R2.

Free Windows Admin Tool Kit Click here and download it now
October 27th, 2014 8:39pm

Hi,

They provided me with a private hotfix but that didn't work so they are now working on a new revision. I don't know when it will be available except that a lot of effort is being put into it as there are other people with this issue. I would urge you to log a call with them, the more people that report the issue then the higher priority the fix will get. 

October 28th, 2014 10:10am

When I opened a case, the engineer suggested the following:

  1. Disable ODX with Set-ItemProperty HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem -Name "FilterSupportedFeaturesMode" -Value 1
  2. Temporarily uninstall any Symantec products
  3. Install http://support.microsoft.com/kb/2964439
  4. Engage Dell regarding the timeouts and update EQLDSM.SYS to the latest version.

I had already done #1. #2 didn't apply.

I installed kb2964439 and tested but the problem remained.

When I initially deployed these servers, I used 4.7 of the Dell HIT kit.  I downloaded 4.7.1 today and updated to it.  The problem still existed.

Getting a little desperate for some change, I completely removed the HIT kit and tested.  Not surprisingly, that didn't turn out well so I reinstalled 4.7.1 of the HIT kit.  At that point, I got pulled away for a few hours and when I returned, decided to try disabling the Windows firewall even though I had rules in to allow traffic from the EqualLogic IPs.

When I attempted a clone that time, it finished without generating a single 5120 event.  I turned the firewall back on just to confirm that was the issue and it still worked.  At this point, I'm thinking it was the act of uninstalling the HIT kit 4.7 and doing a semi-clean install of 4.7.1.

I have 2 more nodes in that cluster that I will update the HIT kit on tomorrow to see if that solves the problem for me.

I would expect that you received the same suggestions from MS support as I did so if you've already done all of that and want to compare notes/configurations, let me know.

Free Windows Admin Tool Kit Click here and download it now
October 31st, 2014 2:10am

Hi,

Please let me know the outcome of the HIT Kit upgrades on your remaining nodes.

I do have a workaround and that is to create a temp CSV disk that is dedicated ONLY to cloning. If I need to clone a VM I migrate it onto the temp CSV, clone it and then migrate it back. Then the cloning only affects the TEMP CSV (and only the VM that is getting cloned).

Hope this helps.

October 31st, 2014 5:00pm

Apparently my optimism was a bit premature.  I had no luck getting the other 2 nodes to behave properly so I went back to the original node and was able to reproduce the problem again by performing 2 clones simultaneously.  I think it was succeeding in part because there was less disk activity.
Free Windows Admin Tool Kit Click here and download it now
November 3rd, 2014 9:56pm

The latest suggestion I received from PSS (if only as a troubleshooting step) was to remove the Dell VSS provider using:

'C:\Program Files\EqualLogic\bin\EqlVss.exe' /unregserver

After doing that and restarting the following services (or rebooting), I was able to perform a clone without receiving any 5120 events.

  • Volume Shadow Copy
  • Equallogic VSS Requestor
  • Equallogic Auto-Snapshot Manager Agent

Another side effect is that whenever I start the Auto-Snapshot Manager console, I receive an error.  The only way I've found to fix this (and break cloning) is to perform a repair installation of the HIT kit.

I didn't bother configuring a lot in the Auto-Snapshot Manager as I really wanted the HIT kit for the DSM and presumably optimal MPIO configuration.  So it's possible that my lack of configuring everything was part of the issue.
November 6th, 2014 8:29pm

Might be worth checking if the cluster heartbeat is on the same nic as the network you are using for this. and if so ensure the heartbeat has a higher priority. We had a similar issue when live migrating and this was the cause of the csv's being lost
Free Windows Admin Tool Kit Click here and download it now
November 6th, 2014 9:20pm

I am seeing this same issue with SCVMM 2012 R2 and Server 2012 R2 Cluster. Originally the cluster was using a 2-member LACP team for both Live Migration and CSV networks. This team has since been broken and there are now deticated CSV and Live Migration networks. The interesting thing about what we found during testing that these events only occur when a clone is initiated from the non-coordinator and non-datacenter nodes in the cluster. During this operation we see heavy Host network utilization in which we'd expect the cluster to use the CSV network instead. When initiating the clone from the coordinator node we see expected behavior of most (if not all) traffic going over the iSCSI networks. I too have an escalation to Microsoft where they are tracking this as a potential bug. Waiting to hear back after collecting a CSV tracelog.

November 13th, 2014 3:55pm

CSV trace did not show anything, but Microsoft was able to see more using Procmon and it appears to be the same issue. Waiting on private patch to be released. Will update this thread as I have more.
Free Windows Admin Tool Kit Click here and download it now
December 9th, 2014 4:04pm

Current status is fix will be in either UR5 for SCVMM 2012 R2 at the end of January, or UR6 scheduled for April 2015.
January 5th, 2015 2:10pm

Any Update?

I am facing the same issue

Has this fix been included in UR5 finally?

Free Windows Admin Tool Kit Click here and download it now
March 11th, 2015 3:50pm

We have the same issue.

Why should it be fixed with scvmm 2012 R2 RU5???The error message pops up in the Failover Manager Events or even at the hyper-v host.

So I would expect that a fix for hyper-v or a failover manager component must be fixed?

Regards

Thomas

March 16th, 2015 12:03am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics