DPM 2012 SP1 Beta - Causing Server 2012 Hyper-V Cluster hang / ISCSI problems (Network Steve Forum)

DPM 2012 SP1 Beta - Causing Server 2012 Hyper-V Cluster hang / ISCSI problems

Hi All,

First of all, I know it's a beta and these are the perils of being an early adopter, but I've got a serious problem.

I've upgraded our production Hyper-V cluster to Server 2012. The setup is a 4 node cluster running CSVs on an ISCSI SAN with MPIO via dual gigabit Ethernet networks. The SAN storage is provided by Open-E DSS7 and replicated to another server in a different building.

Post the upgrade everything about the cluster seemed stable and to work as expected - live migrations etc all working. I then turned my attention to backups, and I discovered that Server 2012 wasn't supported by DPM. Fortunately there is a beta of DPM 2012 SP1 which adds support for Server 2012, unfortunately there is no upgrade path from the beta to RTM of SP1. Not wanting to upgrade our production DPM server to a beta, I installed a copy of DPM 2012 SP1 beta on a VM to provide a stopgap backup solution for VM level backups of certain machines that couldn't be backed up in other ways. I realise that running the backup server on the same cluster / SAN as the stuff that's being backed up is an odd thing to do, but this at least serves to provide snapshots, SAN replication provides resilience, and like I say, this is a stopgap.

Then I started noticing problems. First symptom was that on starting / rebooting VMs, sometimes other VMs would hang for perhaps 30s - 2m, people would start complaining that SharePoint had gone unresponsive etc. However, they would come back to life in a minute or two.On a couple of occasions we came in in the morning to find a number of VMs off or paused (backups ran overnight). Both of these problems occurred only when the DPM server was turned on. I thought the issue might be general load on the SAN, having both the backup server and the machines being backed up living on the same CSV / hardware. I moved the DPM server to a different ISCSI box and put on aggressive throttling (200Mbps) to try to reduce load, but the problem continues.

The event logs on the Hyper-V cluster suggest I/O timeouts to the SAN at the times of the backups. Lot's of event ID 1069, 1205, 1146, 1230, (various cluster resources failed). The interesting one I think is 5120 Cluster Shared Volume 'Volume5' ('VOLUME NAME') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.

Is anyone else using SP1 beta to successfully backup a 2012 Hyper-V cluster?

Is anyone seeing the same problem?

Is it likely that this is a problem with SP1 beta, will it be fixed at RTM?

Any suggestions for a stopgap solution?

I think I might try setting up a test physical DPM server to check the issue isn't in someway related to the fact that the DPM server sits on the same cluster it's backing up. I'm also happy to consider the problem could lie elsewhere i.e. with the SAN storage (this was upgraded from v6 to v7 at the same time as the 2012 upgrade, but as soon as I tell the vendor that the problem relates to running a beta of DPM they will be pointing fingers at that.

Thanks,

Tim

Moved by Mike JacquetMicrosoft employee, Moderator Friday, November 23, 2012 4:09 AM (From:Data Protection Manager - General)

November 22nd, 2012 4:02pm

That would be most appreciated Mike, thanks very much.

Proposed as answer by cciuleanu Wednesday, February 06, 2013 12:29 PM

Free Windows Admin Tool Kit Click here and download it now

November 26th, 2012 5:21pm

I'm replying to this thread because

a) It's one of the only two threads on the whole internet that mentions "'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR"
b) I'm getting the same symptoms.

I can confirm that one of my nodes in my Hyper-V 2012 cluster recently experienced the following event:

Log Name: System
Source: Microsoft-Windows-FailoverClustering
Event ID: 5120
Logged: 02/12/2012 18:01:30

Details: Cluster Shared Volume 'Volume2' ('ClusterStorage Volume 2') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.

I can also confirm that I am using DPM 2012 SP1 Beta to back up this cluster. I have been running this environment for quite some time now, and I can confirm that I've received 15 of these kinds of events (14 of which I was completely oblivious to). What prompted me to do research this time is that I discovered that 3 of my virtual machines were in a paused state and were not available. My other node (two node cluster) has had 5 of these events.

As this is already in the hands of Microsoft I won't log a call but will follow this thread. If there is any further information I can provide please ask.

Oh yes, my primary storage is Fiber Channel, so it's not an iSCSI problem.

Edited by LesterClayton Monday, December 03, 2012 8:11 AM

December 3rd, 2012 11:11am

I've found a workaround for this I thought I'd share. It's a little convoluted but if like me not having your servers backed up was giving you sleepless nights, it might be worth it.

Server 2012 introduces Hyper-V Replica allowing you to push an offline copy of your VMs to a remote server / site for DR purposes. This works from cluster to standalone. It's pretty simple to set up. You need a server with Hyper-V role installed to host the replicas.

Once your replicas are set up you can use DPM to backup the replicas. The replica VMs are turned off normally anyway so if backups do cause brief disk glitches it isn't going to interrupt any important services. My guess is this is a cluster related issue anyhow, so having the replicas on a standalone machine removes that issue.

HV Replica does allow hourly snapshots of the replicas, but it seems that it's not possible to change the frequency of these, so this isn't an efficient way of providing a decent retention time. For some reason, when I tried it DPM would only see the replicas to backup if snapshots were turned off.

I've only set this up today, so can't comment on the long term reliability, but so far so good.

Tim

Edited by TimBoothby Friday, December 07, 2012 12:38 PM Typo

Free Windows Admin Tool Kit Click here and download it now

December 7th, 2012 3:36pm

I'm having the same issue, aslo 2 clusters:

cluster1 4x HP ML330 G6, 2x 8 Gbit FC Switch, HP P2000 G3

cluster2 (testing) 2x HP ML110 G6, directly connected via 4 gbit FC to HP P2000 G3

Sometimes some LUN disappear, or is inaccessible (and I have to switch on/off maintenance mode on this LUN), sometimes VMs on affected HyperV host pause.

Both clusters have problems witch backup and I see these events.

Cluster Shared Volume 'Volume6' ('HyperV Data 6') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.

Edited by Martin Poisel Thursday, January 03, 2013 11:10 PM

January 4th, 2013 2:07am

Thanks Rich - you've put your finger on it. I've done the update and turned off ODX. I've done a number of backups successfully - there seemed to be some brief glitches in the availability of some of the VMs, but nothing crashed.

Then all the VMs on one of the nodes started flashing critical messages, shutting down, rebooting, migrating to other hosts etc. Looking into it, the host seemed to be out of memory, even with most of the guests offline. As with Rich, this was node was the storage owner. Cancelling the in progress backups immediately freed up the RAM.

I agree with Rich's diagnosis - severe memory leak.

Edited by TimBoothby Tuesday, January 15, 2013 4:32 PM

Free Windows Admin Tool Kit Click here and download it now

January 15th, 2013 7:23pm

We have seen this, too. Within our monitoring software, SQL Sentry, we noted that the memory ballooning is tied to the file cache - which I'm guessing is related to the shadow copy/vds/vss stuff. We have just installed the released hotfix and we're working to see if the stability issues are resolved, which were a much bigger deal for us...and have left me sleep deprived.

I've pasted a screenshot below from SQL Sentry Performance Advisor. It shows the memory peaks with each VM being backed up on that host.

One other thing - I had the host exhaust its memory when the pagefile was set to 4GB, but have since changed it to allow WS2012 to do whatever (system managed). Not sure if that has helped, but can't say it has hurt either. The host has 96GB and in VMM I had reserved 6GB...but when DPM kicked off, it just mowed right over all of it. Sigh.

Edited by MarkLarma Tuesday, January 15, 2013 11:20 PM

January 16th, 2013 12:07am

After installing KB2799728, I got this console error (on all server, I applied KB). I can manage my clusters only remotly from server without KB2799728.

I can aslo confirm memory leak when backup runs.

Edited by Martin Poisel Thursday, January 17, 2013 11:31 AM

Free Windows Admin Tool Kit Click here and download it now

January 17th, 2013 2:15pm

Is it possible to run 2012 VMs in 2008 R2 Hyper-V Cluster? I starting to thing about reinstalling my Hyper-V servers and configure a new 2008 R2 Cluster.

If your VM based on VHD (non VHDX) - it's possible to migrate easy to 2008 R2 back. Even if VM configuration will be unreadable - just create new VM and assign necessary VHD-files to it.

Edited by AndricoRus Tuesday, January 22, 2013 12:11 PM

January 22nd, 2013 3:10pm

I to am having the memory leak issue to the point it crashes the Host and all the VMs on that host save critical and jump ship. Very frustrating. I have applied KB2799728 and am now waiting on whatever the latest fix to this fiasco will be.

Edited by Seth H. _ Thursday, January 24, 2013 9:01 PM

Free Windows Admin Tool Kit Click here and download it now

January 25th, 2013 12:00am

6 node cluster running 88 VM's with iSCSI Storage on HP Lefthand with production workloads!

Everything fine until we migrated heavier workloads to the cluster.

Then.... we experienced the paused VM issue back in December.
Then.... we applied the patch a few weeks ago and had the CSV IO Timeout issues every other day.
Then.... we Disabled ODX yesterday and now have the memory leak issue.

Server 2012 Hyper-V 3.0 has become a nightmare to administer with these problems.

Come on Microsoft we need this memory leak fixed!!!

Edited by TrevorBaker1979 Wednesday, January 30, 2013 5:14 PM

January 30th, 2013 8:13pm

I should have clarified; I was specifically asking about the question where the Hyper-V instances are running within the CSV, but the destination of the backup is not. Regardless, it sounds like this is a very serious problem and I can only imagine how frustrating it might be.

Hi RJMPhD, sorry for misunderstanding what you were asking. In our environment our DPM server is one of the few standalone physical servers with a directly attached SAS array were it stores the backups. So in our case yes, the Hyper-V instances are running within the CSV volumes but are being backed up to a destination that is outside of the CSV.

Edited by HorusCG Thursday, February 07, 2013 4:31 PM

Free Windows Admin Tool Kit Click here and download it now

February 7th, 2013 7:31pm

I have already exported, converted all WMs, re-installed Hyper-V Cluster with 2008 R2 and re-configured everything. Just imported the last WMs. Now installing DPM again to run backups. Hopefully alot better then on 2012.

I know it is easy to complain. But i think Windows Server 2012 with Hyper-V would be great when they fixed all the problems. Also i want to say, I will never again be first to try out new MS products. I will wait about 6-12 months before trying.

Br
Patrik

Edited by boje_ Friday, February 08, 2013 11:13 AM

February 8th, 2013 2:12pm

http://support.microsoft.com/kb/2813630

Proposed as answer by Aaron Marks Saturday, February 16, 2013 8:41 PM

Free Windows Admin Tool Kit Click here and download it now

February 16th, 2013 2:24pm

Also in the same boat with these errors, DPM 2012 SP1 UR2 + Windows 2012, 10 node cluster using CSV.

We are currently migrating from 2008R2 cluster to 2012 so this is quite scary. Already had to fix 2 VM's which couldn't start.

Edited by -DeNMaN- Friday, May 03, 2013 1:02 AM

April 23rd, 2013 6:19am

Hi Paul,

have a look at this Article, the Hotfix was released today and it seems to solve the Problems. I've installed the Patch already via CAU and did not receive any Errors since now.

http://support.microsoft.com/kb/2838669

Lets hope the MS finally got it now.

I'll update you when i receive any Errors.

Edited by Hummeldum Wednesday, May 15, 2013 10:12 AM Forget to paste Link ;)

Free Windows Admin Tool Kit Click here and download it now

May 15th, 2013 1:12pm

I was encountering the 2 of the issues described in KB2838669.

Before this KB, I was getting Failover Clustering timeout errors once a week when my DPM starts its snapshots.
Yesterday I've installed this KB on 1 of my node, and things goes wrong : I've been encountering Failover Clustering 8 times in only 5 hours starting from the beginning of my DPM snapshots. Worst ? All my virtual machines hosted on this node crashed ( which was not the case when I had some failover clustering errors before ).

Weirdest thing ? All my DPM snapshots were successful anyway !!

So result of the KB ? I shouldn't have installed it :/

I'm running my nodes on Win Srv 2012, and my DPM server is runnung DPM 2012 SP1. The only hotfix I installed before on my HyperV hosts is kb2813630.

Edited by tena6ous Thursday, May 16, 2013 7:28 AM

May 16th, 2013 10:25am

I'm experiencing a memory leak that I think is related to this thread, but I would like some feedback on what others are experiencing. I have a 2 node Hyper-V 2012 cluster (full install) and I'm using DPM 2012 SP1 to back it up. On the node that owns the CSV, there is an increase in memory that seems to coincide with my backups for time and amount of data transferred. For large backups like Exchange, this fills up the server's memory and will crash the cluster if left alone. The memory does not become available after the backups complete. If I change the owner node on the CSV, the memory clears up immediately and I can even move the CSV back without issue.

There may also be a small memory leak that is not related to the backup times, but dissipates when I change the CSV owner.

I've installed all available updates on the two host servers (including those released yesterday) as well as hotfixes KB2813630-v2 and KB2838669. I've also disabled ODX and serialized the backups.

I'm not seeing related errors in Failover Cluster Manager, but I'm watching the servers like a hawk and changing the CSV owner node as needed to clear up the memory leak.

My storage device is an EqualLogic PS6100X with the latest HIT Kit (4.5) installed.

Is this what others are experiencing? Any thoughts?

This thread has been very helpful and I've been following it very closely for the past week or so! Thank you all for your input! ^_^

We have almost the exact same setup and problem. Windows 2012 cluster, 7 nodes running primarily SQL VMs. Dell Blade servers and EqualLogic PS6110XV with HIT Kit 4.5. We have installed all the hotfixes including KB2838669, and disabled ODX as well. A backup job triggers the memory leak, but not all the time. Using rammap we can see the VMs show up and never release the memory. We will max out 256GB of memory in hours sometimes. When I move the CSV to another node the problem follows the CSV. My only fix is to reboot the node having the problem then move the CSV back. We have put in 80 hours with MS so far on this.

Using Veeam instead of DPM.

This morning in veeam I disabled using Dell Equallogic VSS HW provider and now only using MS CSV Shadow copy. I am going to see if that helps.

Edited by awinstead Friday, May 17, 2013 2:35 PM

Free Windows Admin Tool Kit Click here and download it now

May 17th, 2013 5:34pm

Hello all,

I have also lots of issues with CSVs and also with DPM.

First I was thinking that the CSV hung because, removed the agent and applied all existing patchs. Now CSV is stable (FC Lun zoning was wrong and only half hosts were able to contact the lun directly, others were redirecting using cluster network, but nothing pointing out that, even Test-Cluster that was showing full green success test for cluster disks). I re-install the agent and the issue come back with VM backups, but no more CSV paused.

I opened a call to Microsoft support, asking me to apply these patchs using the LDR branch (QFE):

http://support.microsoft.com/kb/2838669/EN-US

http://support.microsoft.com/kb/2795944/EN-US

http://support.microsoft.com/kb/2837407/EN-US (?).

For installing the LDR: http://social.technet.microsoft.com/wiki/contents/articles/3323.how-to-forcibly-install-the-ldr-branch-from-a-particular-hotfix-package.aspx

Didn't have time to apply the LDR branch yet (should have been done with CAU hotfix plugin, but actually, the file version is from GDR and not LDR).

Edit: BTW, this is not the subject, but do you also get VMM service crashed when configuring VMM continuous protection in DPM ?
(Set-DPMGlobalProperty -KnownVMMServers vmmserver01.sogeti. local + DPM-VMM Helper Service configuration)

Guillaume

Edited by Guigui38 Friday, May 24, 2013 10:47 AM

May 24th, 2013 1:39pm

It's a bit early to say, but my testing seems to show that my memory problems may be tied to dynamic volumes. I had major memory leaks every night when my system state backups kicked off (agent installed within VM guest) that corresponded to the amount of data being backed up. I created fixed size volumes on my EqualLogic SAN and moved the biggest offenders over; so far I've not encountered this memory leak again.

I do see other, slower memory leaks throughout the day on different VMs. When I move my two dynamic volumes from one host to the other, the memory frees up immediately. I do not seem to have this issue with VMs on the fixed volumes.

After reading Stefan's post, I decided to read up a bit on TRIM. That's when I got the idea that the problem could be a sort of conflict between TRIM and dynamic volumes. I can't say for sure, but things are starting to look up for me. If I can stabilize everything using fixed volumes, I might even be bold enough to try re-enabling ODX and non-serialized backups.

Here's hoping my luck's changed!

Proposed as answer by JeanLouis Wednesday, May 29, 2013 5:20 PM
Unproposed as answer by JeanLouis Wednesday, May 29, 2013 5:21 PM

Free Windows Admin Tool Kit Click here and download it now

May 29th, 2013 4:43am

Dell MD3620i using iSCSI here

Thinking of reinstalling the OS on the host computers without Dell MPIO drivers myself.

Edited by brock_paul Thursday, May 30, 2013 5:17 PM

May 30th, 2013 8:07pm

I have got an Equallogic and two clusters running stable for about 2 weeks.

I still get some 5120 and 5217. After the latest hotfixes the number of errors have droped to maybe 5-10 per week. I haven't had the time to dig further into it but I suspect my virtual file server cluster to cause the problems - the VMs were migrated from 2008R2 Hyper-V cluster to 2012 Hyper-V cluster and have been causing trouble since day one. Maybe I'll just dump and recreate the virtual cluster when I have got the time.

My setup is running without hardware provider a.t.m. I am only using the DSM, PowerShell Module and SMP from the EQL HIT. DPM ist setup to do 5 parallel backups.

Free Windows Admin Tool Kit Click here and download it now

July 11th, 2013 8:19am

Hello all,

last hotfix of the "backup VM on CSV" saga is http://support.microsoft.com/kb/2870270/en-us - Update that improves cloud service provider resiliency in Windows Server 2012. It supersedes KB2848344 and any previously released (KB2838669, KB2813630, KB2790728, etc.) on this issue.

I suggest also http://support.microsoft.com/kb/2869923/en-us - Physical Disk resource move during the backup of a Cluster Shared Volume (CSV) may cause resource outage, strictly related to the same topic.

For anyone who is experiencig 5120 and 5217 have a look at this post: http://social.technet.microsoft.com/Forums/windowsserver/en-US/223eb499-53cd-4590-980a-4078d0b52bd3/statusclustercsvautopauseerror-not-fixed-with-kb2848344.As you can see the MSFT guy says:

Seeing an Event 5120 with an error code of STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR may be expected and can be safely ignored in most situations. It basically means that clustering knew of a software snapshot, but the software snapshot was deleted. So now clustering is resynchronizing its state on the view of the snapshots.

So in general, you should only be worried if you see lots of 5120s with an error code of STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR. That is a sign that clustering is in need of constantly resyncing its state for the snapshots.

Proposed as answer by arndawg Thursday, August 08, 2013 12:25 PM

July 16th, 2013 11:31am

Solved it for me at least. I sometimes do get the 5120 and 5217' events but apparently it is to be expected.

Free Windows Admin Tool Kit Click here and download it now

August 8th, 2013 3:25pm

As much and I would like to say that the hardware provider has solved my problems it has not. I have been keeping up to date with the hotfixes and patches, have a hardware provider in place separated the iSCSI traffic on it's own subnet and lots more. So far my cluster is stable until the DPM backups fire off. When that happens if I have VM's spread across more than one host then the VM's slow way down and the cluster eventually becomes so unstable I have been forced to unplug the hosts. I am at a loss as to what the problem really is but it seems related to backend storage connectivity. I have engaged Microsoft with this problem in the past and they keep saying that everything looks great. It might be time to engage them again or simply rebuild my hosts.

August 19th, 2013 7:19pm

We continue to have problems as well. Currently Dell and Microsoft have a case open. They have multiple dell customers with the same problem.

Dell EqualLogic VSS HW Provider issue:

- Microsoft requested we perform additional tests with MPIO disabled as they suspected it was playing a role

- Testing was completed and results provided

- The Microsoft Engineer plans to start analyzing the results today

- Next steps will be based off of the analysis but we suspect they will supply us with additional instructions for enabling deeper logging to help debug this issue

Free Windows Admin Tool Kit Click here and download it now

August 24th, 2013 12:00am

Hey all,

is there update on this issue?

I am expecting the same issue.

Two Cisco UCS Chassis with DELL Equallogic Storage, 10 GbE iSCSI attached. Servers running W2K12 with actual updates. VMM has UR3 installed. Suggested Hotfixes related to Hyper-V and Failovercluster installed. DELL HIT KIT 4.6 installed. DPM also running with actual updates and Hotfixes.

But still I see that some VMs are going into Saved State or crashing, because the CSV goes offline. Aslo there are VSS 8194 errors on host, but regarding to my internet research, they can safely be ignored (?).

DPM reports Backup was successful.

VMs are working when they are rebootet.

Since some of the affected machines are SQL Server 2012 and Sharepoint servers (2013) this issue is affecting production because I never know when exactly this issue occurs.

How to resolve this issue?

Thanks to all in advance!

September 11th, 2013 4:28pm

Awinstead,

Did you get any feedback from our friends at Microsoft. We are certainly still feeling this pain.

Free Windows Admin Tool Kit Click here and download it now

September 11th, 2013 7:40pm

We continue to have problems as well. Currently Dell and Microsoft have a case open. They have multiple dell customers with the same problem.

Dell EqualLogic VSS HW Provider issue:

-          Microsoft requested we perform additional tests with MPIO disabled as they suspected it was playing a role

-          Testing was completed and results provided

-          The Microsoft Engineer plans to start analyzing the results today

-          Next steps will be based off of the analysis but we suspect they will supply us with additional instructions for enabling deeper logging to help debug this issue

Did you get any updates from MS/Dell ?

We are still having the same issues.

Thanks!

October 17th, 2013 5:00am

Anyone try Server 2012 R2 and DPM 2012 R2 yet to see if they may fix this problem? We have experienced all the issues in this thread with EqualLogic SAN and 2012 cluster. High level Dell and MS support tickets have not yielded any solution so far.

Free Windows Admin Tool Kit Click here and download it now

October 22nd, 2013 8:28pm

Hello,

we are using a Windows Server 2012 Hyper-V Failovercluster on a Dell PowerEdge R810 with Dell EQL PS4110X and PS4110X since two Months. We had a similar connection timeout with the CSV Group when backing up all VMs with BackupExec 2012. The five VMs gone offline.

October 25th, 2013 2:39pm

We just had a cluster explosion over the weekend that looks suspiciously similar. Our cluster hosts are still 2012 R1, but I upgraded our DPM environment to 2012 R2 prior to this happening.

Since installing all 17 hotfixes, or however many there are by now, things have been somewhat better, but still having issues.

HP EVA4400, software VSS providers, 4 node cluster fairly lightly loaded.

Free Windows Admin Tool Kit Click here and download it now

November 5th, 2013 10:53pm

i have 2 clusters, a 10 node 2012sp1 cluster on a eva 4400 and a 6 node 2012r2 cluster on a 3par 7200 both being backed up by dpm 2012r2 and i get the same errors and reboots on both clusters. so its not fixed in 2012r2.

Ill try 3par hardware vss providers when i get my hands on it.

November 11th, 2013 1:59am

we have KB2870270 installed on our 2012 cluster hosts, trying to protect with dpm 2012 R2 with software VSS. We have ODX, TRIM settings as default, as i believe the KB2870270 suppose to fix the issues related to them.

I added few VM's to the backup and the initial replica creation went fine, but strangely after few hours even when there are no backup jobs running we start getting 5120('STATUS_IO_TIMEOUT(c00000b5), 1230,1146, 5142 (ERROR_TIMEOUT(1460) at which point the node looses the CSV.

Free Windows Admin Tool Kit Click here and download it now

November 26th, 2013 12:52am

I have a Dell PS4000XV and 2x R710s in a failover cluster and I use DPM 2012 R2 to backup the VMs serially on the PS4000 from the cluster nodes. I recently migrated the cluster from 2008 R2 to 2012 R2 by copying the cluster roles and then detaching/attaching the iSCSI volumes. Once the migration to 2012 R2 was complete I tried to backup a migrated VM with DPM and found that at the later stage of the VM backup the system hung while it read/write what seemed to be the entire VM size of data and then things went back to normal. I tested this about 5-6 times more using both nodes and the same issue occurred consistently. Only once did a VM timeout and crash due to the hang, but during every backup the clients timed out connecting to the services on the VMs with the storage on the node being backed up.

As a test I have created a new CSV on the PS4000, moved some of the VM storage to it and the backups are completing without issue. I am going to recreate the remaining CSVs and move the VMs.

November 26th, 2013 6:23pm

Hi being reading this thread with interest : i have a 10 node 2012R2 Hyper-v 3.0 cluster (fully patched)

With 2012 R2 Storage spaces backend, SMB3.0

With Fibre attached san Storage.

DPM2012 R2 , fully patched.

and i still occasionally get this errors on the Storage spaces Cluster.

Paused State because of '(c0130021)' all I/O will be temporarily be queued.

This seems crazy, as i am running all the latest agents and versions of code. I will look to raise a case with premier support. and report back any findings.

regards

Mark

Free Windows Admin Tool Kit Click here and download it now

December 10th, 2013 4:53pm

This problem seems to have flared up again for me over the last few weeks, can't put my finger on what has changed, it did seem to resolve itself for a while after the May 2013 hotfix.

Anyway - there is another hotfix which might be relevant to some people although it seems to tackle a specific issue where the guest VM crashes at backup time if it has many snapshots.

http://support.microsoft.com/kb/2908415/en-us

Edited by TimBoothby Thursday, December 12, 2013 1:32 PM

December 12th, 2013 4:11pm

There is yet another hotfix for CSV backup issues which got sneaked out over Xmas - http://support.microsoft.com/kb/2878635

This article introduces an update that improves the resiliency of the cloud service provider in Windows Server 2012. This update is dated December 2013.

This update replaces update 2870270, which is used to improve resiliency. Also, this update includes update 2869923 and update 2908415. Additionally, the update resolves several issues that occur in the following scenario:

You have a Hyper-V failover cluster.
The Hyper-V resources are saved in .vhd files on Cluster Shared Volumes File System (CSVFS) volumes.
You use a backup solution. For example, you use System Center Data Protection Manager (DPM) in the Hyper-V environment.
You try to perform a backup, and a snapshot is taken of the CSVFS volume.
The current active node encounters an error, and the cluster fails over to another node.
DPM may start a consistency check on the volume unexpectedly.

I still seem to be having problems so will give this a try.

Free Windows Admin Tool Kit Click here and download it now

January 3rd, 2014 6:44pm

Be aware that backing up a replica is NOT supported:

http://blogs.technet.com/b/dpm/archive/2012/08/27/important-note-on-dpm-2012-and-the-windows-server-2012-hyper-v-replica-role.aspx

The important thing to note about this is that while the DPM agent can be installed on both servers with no issues and you can backup the Primary DPM server as usual with no problems, on the Hyper-V Replica server you can enumerate the virtual machines and may even be able to back them up successfully, however backing up or restoring the Hyper-V replica is not supported.

While the Patches mentioned in this article tipically solve any issue:

http://support.microsoft.com/kb/2784261/EN-US

http://support.microsoft.com/kb/2878635

http://support.microsoft.com/kb/2813630

Hope this helps

January 27th, 2014 3:46am

This topic is archived. No further replies will be accepted.