SCVMM 2012 SP1 Virtual Switch on Cluster Node Disappears

I have been running into an interesting but an alarming issue with virtual switches. I have recently built a 4 node Server 2012 Hyper-V failover cluster. I have built numerous 2008R2 and 2012 Hyper-V clusters before, so I am pretty familiar with the process. I have rebuilt my VMM due to software problems, and on this new cluster, I have configured the virtual switches with nic teaming on 2 of the virtual switches.

I have 4 clusters being managed by this VMM.. and this newly built cluster keeps losing all of the VMM virtual switch configurations, meaning if I go to properties on the problem host and click on virtual switches, its blank. Refresh host cluster flags all virtual machines in this cluster as "Unsupported Configuration," making the vm unmanageable via VMM. 

Information (26844)
Virtual switch (Virtual Switch Name) is not highly available because the switch is not available in host (One of the Hyper-V Hosts).

Recommended Action

All virtual servers on that host still has connectivity, as all virtual switch configs are normal when looking at it with Hyper-V Manager or Failover Cluster Manager..

The workaround is to evacuate the host using Failover Cluster Manager and reboot the host. Then refresh the host. Then refresh the VM's. I cannot consider this a 'workaround' as I cannot be rebooting my hyper-V hosts every week.. and migrating virtual servers constantly just for this reason. (DPM backups have a FIT with csv's if it tries to backup a vm on a same csv as another vm thats being migrated.)

I have been wrestling with this problem for a few weeks now. The cluster has been slicked and completely rebuilt.. Still same problems. Has ANYONE else out there seen this issue? Does ANYONE out there suggest a way to go about further troubleshooting this issue? 


  • Edited by cheesewhip Tuesday, August 26, 2014 2:16 PM correction
August 26th, 2014 2:09pm

Were the NIC teams configured and deployed via VMM? 
Free Windows Admin Tool Kit Click here and download it now
August 26th, 2014 5:02pm

Yes, configured from VMM, when virtual switches were created.
August 26th, 2014 8:13pm

Do all of the network adapters have the correct Logical Networks assigned when you view the host's Hardware properties in SCVMM?

Free Windows Admin Tool Kit Click here and download it now
August 26th, 2014 9:51pm

Yes. 

All virtual switches and virtual network adapters do have correct logical network connections. 

August 27th, 2014 1:54pm

Well those are the obvious candidates.  It would be interesting to see the output of Get-SCVirtualNetwork.  Something like:
Get-SCVirtualNetwork -VMHostCluster %ClusterName% | ft Name,VMHost,LogicalNetworks,LogicalSwitchComplianceStatus

Free Windows Admin Tool Kit Click here and download it now
August 27th, 2014 6:25pm


Hi,

Just want to confirm the current situations.

Please feel free to let us know if you need further assistance.

Regards.

September 9th, 2014 4:48am

We are having this exact same problem as well.  We have two 4-node clusters running on Windows Server 2012 w/ Hyper-V.  We are running Virtual Machine Manager 2012 SP1 installed.  On Sunday night, one Hyper-V host in ClusterA lost it's Virtual Switch configuration (within VMM only) and we had to evacuate the host via Failover Cluster Manager.  We originally tried to put the host in Maintenance Mode through VMM which went through fine, except none of the VM's actually migrated off the node with the issue.  After evacuating the node manually using FCM and verifying all the roles were drained.  We rebooted the node, stopped maintenance mode within VMM and refreshed the node in the VMM.  Now VMM is showing all of the correct information.  On Monday night, the same thing happened in ClusterB with a Hyper-V host losing its Virtual Switch within VMM.  We took the same steps to troubleshoot as the OP and everything was fine.

In both scenarios, I checked the Virtual Switch in Hyper-V Manager and everything looked to still be correctly configured.  When we run the PowerShell script in this article: http://blogs.technet.com/b/scvmm/archive/2013/09/10/working-through-an-unsupported-cluster-configuration-scenario-in-virtual-machine-manager.aspx - everything checks out fine.  FYI - we have our networking configuration pushed out through SCVMM as well with 1 Logical Switch w/ 3 Virtual Network Adapters (Management, Live Migration Traffic, and Cluster Traffic) being deployed to each Hyper-V host.  Does anyone know why this could be occurring?

Any help would be appreciated.  Thanks.

Free Windows Admin Tool Kit Click here and download it now
September 9th, 2014 6:57pm

Sorry I have been tasked with different things and was not able to reply to the request to run the powershell scripts to query the virtual switch to physical switch connections. I have no idea why this thread has been marked resolved or answered when it has not been. I have had the issue again yesterday- one of my 4 node clusters losing ALL of its virtual switch configuration in VMM. Failover cluster said everything was fine, Hyper-V manager said everything was fine... evacuated the host using FCM, rebooted the host, everything returned to normal. 5 minutes later, another host in the cluster exhibited the same problem. Except this time, I accidentally put the host on maintenance mode before starting to evacuate the host in VMM... finger slipped. I was expecting it to fail because with all the virtual switches missing from the source host live migrations should fail.... but boom. Green check mark. The server is in maintenance mode. Except no virtual servers moved. And the server is NOT in maintenance mode. All the virtual servers are still humming away. The only thing I could do was take the maintenance mode off from VMM.. which thought it has succeeded in putting the server in maintenance mode. then vacate the host, reboot it. 
September 9th, 2014 7:16pm

Thanks for the update cheesewhip.  We were having the same issue every single night up until last Thursday.   On Thursday, we decided to P2V our VMM2012SP1 box as it was running on a physical standalone server.  We virtualized it and moved it into a cluster that it is managing.  Since then, we noticed the 30 minute cluster refreshes no longer seem to be taking place anymore.  Also, we haven't had a single problem with a Hyper-V cluster node losing it's virtual switch configuration within VMM since then either.  It is quite strange how this is all ending up as we seem to discover a new VMM problem every couple of days.
Free Windows Admin Tool Kit Click here and download it now
September 15th, 2014 3:03pm

We have seen the same issue occur as well. Perhaps we are just special. In our case we have three node cluster... not real fun as you cannot use VMM for migrations or anything when this occurs as it determines the network to not be highly available.
September 19th, 2014 5:41pm

SteveLith,

You are aware that you can still live migrate within the cluster using failover cluster manager, correct?

Free Windows Admin Tool Kit Click here and download it now
September 19th, 2014 5:58pm

Thanks cheeswhip... I'm aware of that.. but sorta defeats the purpose of balancing via SCVMM :(

September 23rd, 2014 8:24pm

Hi Folks 

Have you found any fix ? I'm in same situation . VMM seems to loose all the switch info. However Cluster is fine . VMs are up and running . Im on SCVMM 2012 SP1 with rollup7 . 

Guys Please let me know if you find solution . So far Im only rebooting the hosts to fix these issues . after sort of 15 - 20 day this will happen again . It has bee 3 months now Im getting same issue . Not sure how to proceed . My company is not ready for 2012 R2 and VHDX etc due to other compatability issues . 

Thanks

Mumtaz

Free Windows Admin Tool Kit Click here and download it now
October 1st, 2014 5:23pm

Hi Folks 

Have you found any fix ? I'm in same situation . VMM seems to loose all the switch info. However Cluster is fine . VMs are up and running . Im on SCVMM 2012 SP1 with rollup7 . 

Guys Please let me know if you find solution . So far Im only rebooting the hosts to fix these issues . after sort of 15 - 20 day this will happen again . It has bee 3 months now Im getting same issue . Not sure how to proceed . My company is not ready for 2012 R2 yet 

Thanks

Mumtaz


  • Edited by mumtazkhan Monday, December 15, 2014 1:36 PM removed private info
October 1st, 2014 5:25pm

Has anyone found a fix for this, this has been happening to my system for months, I have both a SCVMM 2012 SP1 and R2 VMM lab and it happens on both versions.
  • Edited by Drakie Thursday, October 09, 2014 2:53 AM
Free Windows Admin Tool Kit Click here and download it now
October 9th, 2014 2:53am

I have opened a case with MS regarding this case. 

Their support team indicated that they have encountered this issue before, and most of those issues were fixed with the application of kb2792123 and kb2842230. I have installed 2842230 on all of my 2012 hyper-v hosts, but 2792123 would not apply, saying that it was not applicable. They explained it as the unhealthy condition of the wmi of the hyper-v hosts, not the VMM application itself. 

It has been 3 days since the hotfix was applied, and so far, our virtual switches in VMM are intact. I would say its premature to state that this hotfix has resolved the issue- we are still at wait and see stage. 

I will post here in a while if the hotfix has resolved the issue or not. Even though I cannot guarantee that this hotfix will work or not in your individual cases, if you were as eager as me to try something when this was broken, I would go ahead and apply the hotfix to see if it will resolve the issue. 

October 9th, 2014 6:03pm

Awesome, I am putting them on now as well, same results as you one applied the other did not.  Hopefully we have success.

Thanks a lot, keep me posted if you don't mind.

Free Windows Admin Tool Kit Click here and download it now
October 10th, 2014 1:58am

All,

We opened a support case with MS last week regarding this issue as well.  We applied the same WMI hotfixes that cheesewhip mentioned.  You can find information/links to these here:

KB2842230 "Out of memory" error on a computer that has a customized MaxMemoryPerShellMB quota set and has WMF 3.0 installed - http://blogs.technet.com/b/yongrhee/archive/2014/02/16/list-of-winrm-related-hotfixes-for-post-rtm-for-windows-8-rtm-and-windows-server-2012-rtm.aspx

KB2792123 WMI does not work correctly after you run the OOBE in Windows RT, Windows 8 and Windows Server 2012  - http://blogs.technet.com/b/yongrhee/archive/2014/02/16/list-of-windows-management-instrumentation-wmi-related-hotfixes-post-rtm-for-windows-8-rtm-and-windows-server-2012-rtm.aspx

It looks like KB2792123 is only applicable to OOBE installations which was not the case in our environment.  We applied KB2842230 on Thursday 10/2 (8 days ago) and we have not had anymore issues since applying this hotfix.  

We applied the hotfix to all of our Hyper-V hosts managed by VMM and our VMM Management Server(s) as well.

October 10th, 2014 2:16pm

Its been 7-8 days since the hotfix was applied. so far so good.
Free Windows Admin Tool Kit Click here and download it now
October 14th, 2014 1:57pm

Thanks . Did you had to reboot the host after applying the hotfix ? . Because it may happen in sought of 20 - 25 days though .  If you have not rebooted the hosts after applying the hotfixes then it sounds promising .

Thanks

Mumtaz 

October 14th, 2014 2:00pm

Yes, reboot was necessary. But can you not live migrate and vacate a host, install the hotfix, reboot, repeat?
Free Windows Admin Tool Kit Click here and download it now
October 14th, 2014 3:14pm

In my case I can live migrate the VMs from normal Failover cluster manager . I will apply the hotfix and see if it fixes it .  Please also update us what MS Support people conclude with 

Thanks

Mumtaz 

October 14th, 2014 3:16pm

Bad news. The problem surfaced again this morning. I was starting to think that my issue was resolved... Back to Microsoft support.. 
Free Windows Admin Tool Kit Click here and download it now
October 20th, 2014 6:42pm

Thanks for the Update cheesewhip. I'll be really curiuos to know what Microsoft has to say on this . Not sure if A.J. DiLorenzo still have a case open. 

Please keep us posted with next update . 

Thanks

Mumtaz 

October 21st, 2014 10:35am

Hello,

The problem resurfaced again for us as well yesterday.  We still have an open case with Microsoft Support and we are going to be updating this case today.  I'll let you all know what we find out.  Stay posted.

Free Windows Admin Tool Kit Click here and download it now
October 22nd, 2014 12:25pm

Been away for a week, just checked mine and I have the same issue back again.
October 24th, 2014 5:10am

Hey Folks 

Any positive response from Microsoft yet?. One more thing I have noticed is when I am refreshing the cluster I have getting below error in event log of the hosts. 

The server {73E709EA-5D93-4B2E-BBB0-99B7938DA9E4} did not register with DCOM within the required timeout. 

Number of entries are equal to number of Virtual Networks in my Hosts. 

There are some fixes revloving around permissions in registry . I have tried to give everyone permissions on this registry but does not help . 

Thanks

Mumtaz 



  • Edited by mumtazkhan Wednesday, October 29, 2014 4:56 PM Additions
Free Windows Admin Tool Kit Click here and download it now
October 28th, 2014 12:51pm

Hi gents,

I did some testing and found following workaround without any downtime: restart "Windows Management Instrumentation".

This is an issue with VMM talking to Hyper-V, and not with Hyper-V itsself. In my opinion, one of the current "imperfections" of VMM/Hyper-V is the overall WMI communication between VMM and VMHost. In reference of the performance counters-issue after a reboot of a VMHost (https://social.technet.microsoft.com/Forums/systemcenter/en-US/46cc0478-e99a-499c-aa48-9e9a84bf2687/sc-vmm-2012-not-showing-performance-counters), restarting the SCVMMAgent-service isn't enough. Just restart the whole "Windows Management Instrumentation"-service. After this VMM is able again to see the Logical Switch again.

Check the dependencies first offcourse, but SCVMMAgent and VMMS can be restarted without any impact on your running VMs.

My environment is SCVMM2012R2, managing multiple Hyper-V clusters (2012 & 2012R2).

Offcourse, this is a workaround, and I'm looking forward to a stable solution for this(, and all other small 'growing pains' of VMM), so please let us know...


October 31st, 2014 9:01am

We have the same issue with VMM 2012 SP1 and 2012 hosts and 2 Logical Switch disappear from hosts even though the Failover cluster report validates as healthy, we have tried the below to resolve:

a. Add the SH-LK-VMM computer account to the local admins groups on the Hyper-V hosts (GPO)
b. Increased the WINRM time out and set WINRM to run in its own instance of SVCHOST (support.microsoft.com/kb/2875120)
c. Installed Patch 7 for VMM 2012 SP1
d. Updated the Ops manage management pack for VMM to SP1 RU2 (social.technet.microsoft.com/Forums/systemcenter/en-US/9384bafb-f72d-4447-85a2-81d20a2ab48d/very-strange-scvm-2012-sp1-issue?forum=virtualmachingmgrhyperv).
f. We also had an issue were VMM lost connect to the hosts due to proxy issues we ran the below command to resolve: netsh winhttp reset proxy.

We thought the above had fixed it, but 2 months later and the issue is back.

Thanks to Ben for the work around, may have to schedule this weekly or something.  

We would love a fix from MS for this..

Free Windows Admin Tool Kit Click here and download it now
November 5th, 2014 10:52am

Make sure you live migrate all the VMs before restarting wmi service on the hosts . However  I hope MS will have some update soon . Lets wait 
November 5th, 2014 11:10am

Make sure you live migrate all the VMs before restarting wmi service on the hosts . However  I hope MS will have some update soon . Lets wait 

This is environment specific, I investigated all related services (and tested) for my environment and was ok restarting WMI without moving the VMs. While restarting WMI the VMs carried on running fine, though Management and HA Failover would have been out of action.

Cheers

Rich

Free Windows Admin Tool Kit Click here and download it now
November 6th, 2014 9:32am

Apologies for the late response. I was busy with a different project which consumed a lot of my time.

Tested restarting the wmi service on the host. It does seem to work for us. Fantastic workaround even MS tech support couldn't come up with.... 

Microsoft did come back with a response... Their response was that the issue has been fixed in VMM 2012 R2 UR4... which does not help me.. because not all of our System Center infrastructure has been updated to R2. VMM is like the last or second to the last in the order system center needs to be updated. 

I asked if there are any plans to fix this issue for 2012 sp1.. no response. 

November 7th, 2014 3:16pm

My issue has been with SCVMM 2012 R2 UR3.

I have upgraded to UR4.  Will advised if it works.  At least you will know then that it has been corrected in R2 if you plan on upgrading at some point.

Free Windows Admin Tool Kit Click here and download it now
November 12th, 2014 3:26am

We have the exact same issue (logical vswitch configuration dissapearing in SCVMM 2012 SP1).

For the ones who have already logged a MS support case: have they already said anything about a possible hotfix for the SCVMM 2012 SP1 version?

Customer having this issue does not have the necessary licenses to upgrade to SCVMM 2012 R2, so a fix for SCVMM 2012 SP1 would be great.

November 12th, 2014 1:47pm

Any response from Microsoft on a fix for VMM 2012 SP1?

Like many others we can't upgrade to R2 at the moment...

Free Windows Admin Tool Kit Click here and download it now
November 17th, 2014 11:30am

Just for everyones info, after 16 days of the upgrade to SCVMM 2012 R2 UR4, I have the same issue as before, this is not fixed.
  • Edited by Drakie Thursday, November 27, 2014 3:43 AM
November 27th, 2014 3:42am

Here is another update sorta...  So I have seen the switches suddenly disappearing in SCVMM 2012 R2..   But I had a new twist last weekend.

I have two hosts that had their vNic (Virtual Adapter) adapters disappear.   Which.. apparently is worse than the switch disappearing.. as it orphaned the network interfaces in Hyper-V .  The vEthernet interfaces showed up in network adapters on the hosts but were marked as  "Network Adapter Disconnected"  .  Apparently orphaned.

This crashed all VM's on the host as the Virtual Adapters that disappeared were iSCSI  !!!   So one of the hosts lost all connectivity to the SAN.  Not good.  The other host lost just ONE of the two 10Gig adapters.. so it stayed up using the one connection but the other is now orphaned as well with a disconnected status.

I have an open case with MS but so far they cannot explain why this occurred.  The only fix is to recreate the vNic and reconfigure the interfaces in hyper-v.  Then use this http://cloudtidings.com/2013/11/20/removing-the-ghost-hyper-v-vnic-adapter-when-using-converged-networks-after-in-place-upgrade-to-w2012r2/   to clean up the orphaned interfaces in hyper-v .

This is all NOT good.    MS best get this VMM stuff sorted.  Or I'll be shopping for VMware soon.. not acceptable IMO.

.02

Free Windows Admin Tool Kit Click here and download it now
December 10th, 2014 5:09pm

Hi All,

Even I had this problem while setting up the Cluster based on SCVMM R2 and Windows 2012 R2 server.

But after installing the Windows Patches and reboot of the server, the problem is resolved.

So I recommend you must install all detected updates on your nodes.

Thanks

Vijay Dalimkar

February 1st, 2015 4:35pm

Hi all,

Just an update on my situation, I also have a call in with MS, I am on SCVMM 2012 R2 and Windows 2012.  I have updated to SCVMM 2012 R2 UR4 no joy, they then suggested UR5, this has now been released, I installed this on the 18 Feb 2015 and today 2 Mar 2015 I have the same un-supported cluster issue.

So still no fix.

Free Windows Admin Tool Kit Click here and download it now
March 2nd, 2015 12:36am

Can confirm the same experience as Drakie.

Update to UR5. To be honest the issue appears first time after UR5 update on one node from 5 nodes cluster. 

March 9th, 2015 12:58pm

New findings:

when working VM placed on the host has no or minimal NW connectivity, Live Migration thru Failover Cluster Manager is possible. When I moved all VM away from the host, JUST reboot solved the problem.

After reboot were all Virtual Switches available and configured. 

Please, troubleshoot someone when the situation will appear again in your landscape.

BTW: get-scvirtualnetwork returns nothing while get-networkadapter return everything allright during the issue.

Free Windows Admin Tool Kit Click here and download it now
March 11th, 2015 8:49am

I am not sure how these are new findings. I have said before that vacating the nodes VMM states that there are problems and rebooting them TEMPORARILY fixed the issue. 

Also, restarting the WMI service also fixes this issue TEMPORARILY.

Anyways, here is an update.

All hyper-v cluster nodes have been updated to the latest hotfix. (we do this on a regular basis) Also researched and found about 7-8 hyper-V cluster related host hotfixes and installed them. (this is our ongoing battle against DPM) No improvement.

Microsoft support has not been able to even identify the problem let alone fix it.

Strangely, the switches disappear sometimes after DPM Hyper-V backups. With all the trouble DPM has caused on our hyper-v failover clusters I cannot rule them out as a contributor to this problem (STAY TUNED FOR AN UPDATE ON THIS)

I have said before, as well as couple of others who have replied to this thread that somehow WMI breaks on the hosts, making them unable to relay the virtual switch info\update to VMM. Virtual switches are seen fine in Failover cluster manager and hyper-v manager. Even with the virtual switches wrecked in vmm, live migration works just fine- and for a short while, the migrated server are no longer marked "unsupported cluster configuration" and vmm actions can be taken against them.

I am still running VMM 2012 SP1, as we still need to update SCOM and DPM brought to R2 before VMM can be updated... but I am losing faith in R2 as people are describing this happening in R2... 

March 11th, 2015 8:05pm

Strangely, the switches disappear sometimes after DPM Hyper-V backups. With all the trouble DPM has caused on our hyper-v failover clusters I cannot rule them out as a contributor to this problem (STAY TUNED FOR AN UPDATE ON THIS)

I am experiencing the same issues, but we are not using DPM on these servers. None of these VMs are currently backed via VSS, DPM, or Checkpoints in our environment.

Environment:

Host: Windows Server 2012 DC

SCVMM 2012 SP1 3.1.6018.0 (I think we're a few hotfixes behind)

Free Windows Admin Tool Kit Click here and download it now
March 12th, 2015 12:21pm

I wasn't saying that dpm was THE cause of the problem. Possibly just coincidental. And I have backups going every night- and I do not have this problem of the virtual switches disappearing every morning when I come in. 

I would, however, like to give a warning about DPM... 2012. R2. well.. upgrading to R2 from SP1

We were having issues with dpm leaving failed backup vss images on our csv's... (cluster shared volume) I am having to resort to using software vss because for some reason I cannot get our hardware vss provider to work nice with dpm. One extended weekend, when things were not being monitored closely... the csv space crept up to over 95%.. which resulted in the cluster losing the csv.... and 77 of our production servers unexpectedly rebooted. Long story short, we needed some sort of "professional" troubleshooting and analysis from Microsoft. We opened a case... and one of the troubleshooting they suggested was to upgrade to dpm2012 r2 as we were running dpm2012 sp1. 

I upgraded the agents on a 4 node cluster. Cluster nodes needed rebooting. So I vacated a node, rebooted it, was getting ready to do the second one... and the rebooted node stopped responding. I rebooted it again, it was fine for about 5 minutes.. So in that 5 minutes I vacated and rebooted the second node. I was left with 2 non functional hyper-v nodes. No computer management. Unable to do log onto the servers even on the console. I spent the next few hours rebuilding ALL the cluster nodes from scratch. 

I did notice right before the dpm upgrade that VMM was saying that 2 of the nodes in the cluster had no virtual switch... and all the vm were marked with 'unsupported configuration.' Microsoft DPM team blames the WMI breakage between VMM and the hosts as the cause of this disaster. 

So... I would urge everyone to fix their virtual switch problem on the hosts (well, in vmm..) if you plan on doing any kind of system center.. DPM or whatever.. or actually doing any kind of upgrade prior to the process by rebooting the hosts or restarting the wmi service.

Oh yes, as a footnote warning- restarting the wmi service DID fail us once or twice so far... so now we are opting to vacate and reboot whatever host is faced with this issue. its kind of annoying. i run a lot of virtual servers...


  • Edited by cheesewhip Thursday, March 12, 2015 7:55 PM left out word
March 12th, 2015 7:54pm

We are still getting this issue, I am starting to wonder if its hardware/drivers cauing it, is there a common factor here?

We are using HP Gen8 Blades (bl460c) and DL380's, the blades use the Eumlex flex fabric, is anyone else using the same hardware?



  • Edited by RichS82 7 hours 20 minutes ago
Free Windows Admin Tool Kit Click here and download it now
March 14th, 2015 7:44pm

We are still getting this issue, I am starting to wonder if its hardware/drivers cauing it, is there a common factor here?

We are using HP Gen8 Blades (bl460c) and DL380's, the blades use the Eumlex flex fabric, is anyone else using the same hardware?



  • Edited by RichS82 Saturday, March 14, 2015 11:40 PM
March 14th, 2015 11:40pm

For 2012 SP1, I am having to rebuild the hyper-v hosts from OS on all of my clusters thats having this issue. Ejecting a node, reloading the os, configuring them, then reinserting them into the cluster. So far, that seems to have fix the issue. My reasoning was that even if you update the VMM server to the latest patch level or install some hotfix, the agents that are on the hosts do not change... and not being able to reinstall the agent (without reinstalling from all the nodes in the cluster) without all the virtual switch configs being unusable, I decided to go ahead and reload all the nodes.. 

Went to MS Ignite a couple weeks back... told MS about this problem... left my contact info... no one called me back. My case with MS... has seen NO movement. 

Free Windows Admin Tool Kit Click here and download it now
May 27th, 2015 10:10am

Crap way for you to have to fix it, good work MS.
May 27th, 2015 7:15pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics