SCVMM 2012 SP1 Virtual Switch on Cluster Node Disappears

I have been running into an interesting but an alarming issue with virtual switches. I have recently built a 4 node Server 2012 Hyper-V failover cluster. I have built numerous 2008R2 and 2012 Hyper-V clusters before, so I am pretty familiar with the process. I have rebuilt my VMM due to software problems, and on this new cluster, I have configured the virtual switches with nic teaming on 2 of the virtual switches.

I have 4 clusters being managed by this VMM.. and this newly built cluster keeps losing all of the VMM virtual switch configurations, meaning if I go to properties on the problem host and click on virtual switches, its blank. Refresh host cluster flags all virtual machines in this cluster as "Unsupported Configuration," making the vm unmanageable via VMM. 

Information (26844)
Virtual switch (Virtual Switch Name) is not highly available because the switch is not available in host (One of the Hyper-V Hosts).

Recommended Action

All virtual servers on that host still has connectivity, as all virtual switch configs are normal when looking at it with Hyper-V Manager or Failover Cluster Manager..

The workaround is to evacuate the host using Failover Cluster Manager and reboot the host. Then refresh the host. Then refresh the VM's. I cannot consider this a 'workaround' as I cannot be rebooting my hyper-V hosts every week.. and migrating virtual servers constantly just for this reason. (DPM backups have a FIT with csv's if it tries to backup a vm on a same csv as another vm thats being migrated.)

I have been wrestling with this problem for a few weeks now. The cluster has been slicked and completely rebuilt.. Still same problems. Has ANYONE else out there seen this issue? Does ANYONE out there suggest a way to go about further troubleshooting this issue? 


  • Edited by cheesewhip Tuesday, August 26, 2014 2:16 PM correction
August 26th, 2014 2:09pm

Well those are the obvious candidates.  It would be interesting to see the output of Get-SCVirtualNetwork.  Something like:
Get-SCVirtualNetwork -VMHostCluster %ClusterName% | ft Name,VMHost,LogicalNetworks,LogicalSwitchComplianceStatus

Free Windows Admin Tool Kit Click here and download it now
August 27th, 2014 6:25pm

Hi Folks 

Have you found any fix ? I'm in same situation . VMM seems to loose all the switch info. However Cluster is fine . VMs are up and running . Im on SCVMM 2012 SP1 with rollup7 . 

Guys Please let me know if you find solution . So far Im only rebooting the hosts to fix these issues . after sort of 15 - 20 day this will happen again . It has bee 3 months now Im getting same issue . Not sure how to proceed . My company is not ready for 2012 R2 yet 

Thanks

Mumtaz


  • Edited by mumtazkhan Monday, December 15, 2014 1:36 PM removed private info
October 1st, 2014 5:25pm

Has anyone found a fix for this, this has been happening to my system for months, I have both a SCVMM 2012 SP1 and R2 VMM lab and it happens on both versions.
  • Edited by Drakie Thursday, October 09, 2014 2:53 AM
Free Windows Admin Tool Kit Click here and download it now
October 9th, 2014 2:53am

Hey Folks 

Any positive response from Microsoft yet?. One more thing I have noticed is when I am refreshing the cluster I have getting below error in event log of the hosts. 

The server {73E709EA-5D93-4B2E-BBB0-99B7938DA9E4} did not register with DCOM within the required timeout. 

Number of entries are equal to number of Virtual Networks in my Hosts. 

There are some fixes revloving around permissions in registry . I have tried to give everyone permissions on this registry but does not help . 

Thanks

Mumtaz 



  • Edited by mumtazkhan Wednesday, October 29, 2014 4:56 PM Additions
October 28th, 2014 12:51pm

Hi gents,

I did some testing and found following workaround without any downtime: restart "Windows Management Instrumentation".

This is an issue with VMM talking to Hyper-V, and not with Hyper-V itsself. In my opinion, one of the current "imperfections" of VMM/Hyper-V is the overall WMI communication between VMM and VMHost. In reference of the performance counters-issue after a reboot of a VMHost (https://social.technet.microsoft.com/Forums/systemcenter/en-US/46cc0478-e99a-499c-aa48-9e9a84bf2687/sc-vmm-2012-not-showing-performance-counters), restarting the SCVMMAgent-service isn't enough. Just restart the whole "Windows Management Instrumentation"-service. After this VMM is able again to see the Logical Switch again.

Check the dependencies first offcourse, but SCVMMAgent and VMMS can be restarted without any impact on your running VMs.

My environment is SCVMM2012R2, managing multiple Hyper-V clusters (2012 & 2012R2).

Offcourse, this is a workaround, and I'm looking forward to a stable solution for this(, and all other small 'growing pains' of VMM), so please let us know...


Free Windows Admin Tool Kit Click here and download it now
October 31st, 2014 9:01am

Just for everyones info, after 16 days of the upgrade to SCVMM 2012 R2 UR4, I have the same issue as before, this is not fixed.
  • Edited by Drakie Thursday, November 27, 2014 3:43 AM
November 27th, 2014 3:42am

I wasn't saying that dpm was THE cause of the problem. Possibly just coincidental. And I have backups going every night- and I do not have this problem of the virtual switches disappearing every morning when I come in. 

I would, however, like to give a warning about DPM... 2012. R2. well.. upgrading to R2 from SP1

We were having issues with dpm leaving failed backup vss images on our csv's... (cluster shared volume) I am having to resort to using software vss because for some reason I cannot get our hardware vss provider to work nice with dpm. One extended weekend, when things were not being monitored closely... the csv space crept up to over 95%.. which resulted in the cluster losing the csv.... and 77 of our production servers unexpectedly rebooted. Long story short, we needed some sort of "professional" troubleshooting and analysis from Microsoft. We opened a case... and one of the troubleshooting they suggested was to upgrade to dpm2012 r2 as we were running dpm2012 sp1. 

I upgraded the agents on a 4 node cluster. Cluster nodes needed rebooting. So I vacated a node, rebooted it, was getting ready to do the second one... and the rebooted node stopped responding. I rebooted it again, it was fine for about 5 minutes.. So in that 5 minutes I vacated and rebooted the second node. I was left with 2 non functional hyper-v nodes. No computer management. Unable to do log onto the servers even on the console. I spent the next few hours rebuilding ALL the cluster nodes from scratch. 

I did notice right before the dpm upgrade that VMM was saying that 2 of the nodes in the cluster had no virtual switch... and all the vm were marked with 'unsupported configuration.' Microsoft DPM team blames the WMI breakage between VMM and the hosts as the cause of this disaster. 

So... I would urge everyone to fix their virtual switch problem on the hosts (well, in vmm..) if you plan on doing any kind of system center.. DPM or whatever.. or actually doing any kind of upgrade prior to the process by rebooting the hosts or restarting the wmi service.

Oh yes, as a footnote warning- restarting the wmi service DID fail us once or twice so far... so now we are opting to vacate and reboot whatever host is faced with this issue. its kind of annoying. i run a lot of virtual servers...


  • Edited by cheesewhip Thursday, March 12, 2015 7:55 PM left out word
Free Windows Admin Tool Kit Click here and download it now
March 12th, 2015 7:54pm

We are still getting this issue, I am starting to wonder if its hardware/drivers cauing it, is there a common factor here?

We are using HP Gen8 Blades (bl460c) and DL380's, the blades use the Eumlex flex fabric, is anyone else using the same hardware?



  • Edited by RichS82 Saturday, March 14, 2015 11:40 PM
March 14th, 2015 11:40pm

RichS82,

I am using Dell M1000e with Dell M620 blades and Dell Equalogic Storage, so unless your hardware is using the same drivers as mine, it is not hardware.

Dell at first thought it was network connection to the storage and DPM but I am no longer doing any backups until the system is fixed (It is a green field that we have not started to use yet)

I have two air gapped systems both have the issue, we had the original network config on one system and Dell updated the config on the other system and I still get the same issue on both, so not network either.

Microsoft were here on Friday and could not find anything wrong with WMI in the logs, they also have another client in Canberra with the same issue now, in fact I know that one of the MS installers is also having the same issue on another clients system,  so hopefully we get a solution soon. 

Free Windows Admin Tool Kit Click here and download it now
March 15th, 2015 11:10pm

Hey everyone. I just stumbled across this post today and I noticed it is still being updated as recently as this week. At my place of employment we have been experiencing this problem as well. I just recently took over for the previous sys admin and was not left much info about the past so I don't know how long this has been going on. I can tell you I have seen it for several weeks and I am always able to fix it by restarting the cluster nodes. Like many others know this is extremely time consuming and aggravating due to the number of virtual machines we have. Compared to others we are pretty small with a single cluster with two hosts. VMM 2012 SP1 is running on a physical 2012 R2 box. DPM SP1 is also backing up our VMs and cluster configuration. Today I tried the "refresh WMI service" workaround that is mentioned in the comments but it failed. I believed it failed because of the Hyper-V Virtual Machine Management service still being in use. On the topic of services, I did notice something that I am having trouble finding a clear answer about and hopefully someone here can help me. 

My question is: Does the System Center Virtual Machine Manager service running on the SCVMM machine and the System Center Virtual Machine Manager Agent service on the hosts need to be running under the same account? The SCVMM service is running under a domain account, which is a local admin, and the SCVMM agent service is running under the local system account on both hosts. Thanks!

March 17th, 2015 3:12pm

Hi

The SCVMM service account which is used on the SCVMM server must be idd local admin on the SCVMM server.

SCVMM pushes agents to the Hyper-V hosts, makes the this SCVMM service account local admin on the Hyper-V server, and runs its SCVMM agent under local system.
The agent is running under local system by design, so no need to change that.

Make sure the service account of SCVMM is local admin on your Hyper-V hosts and no GPOs overwrite your local administrator group on the Hyper-V hosts for example.

Regarding the vSwitch disappearing on hosts: have seen it on Hyper-V 2012 hosts in combination with SCVMM 2012 SP1. Restarting the WMI service and all services depending on this WMI service on the Hyper-V hosts, solved the issue.
On Hyper-V 2012 R2 in combination with SCVMM 2012 R2 and latest Rollup pack (RU 5 at this moment), I haven't seen this issue yet to be honest.

Regards
Stijn

Free Windows Admin Tool Kit Click here and download it now
March 17th, 2015 3:21pm

Stijn,

Thank you for your quick response. In regards to your suggestions:

I have verified that SCVMM domain service account is in the local administator's group on the SCVMM server.

I have verified that the domain service account of SCVMM is also in the local administrator's group on both Hyper-V hosts. Also, there are not GPOs overwriting the local administrator group on those servers. 

I tried restarting the WMI and all dependent services on the hosts but two of the dependent services (Hyper-V Virtual Machine Management and IP Helper) failed to restart because it said they were busy. The other services did restart although some hand to be restarted manually and then I proceeded to refresh the cluster in SCVMM. It again came back to the "Unsupported Cluster Configuration" status. 

Do all the VMs on the hosts need to be migrated off using Failover Cluster Manager before restarting these services? If I have to do that I might as well just restart the hosts which is what I was trying to avoid. Thanks!

March 17th, 2015 3:56pm

The VMs do not need to be vacated before you restart the services. However, there may be a reason why the hyper-v service is not restarting- ie: backup or live migration or error of some sort. Be extremely careful restarting the wmi because there is a remote possibility your host may crash... This is why we are opting to vacate the hosts and reboot the hosts. 

You could try to stop and restart the hyper-v service again... 

Also, do not stop the wmi service and start it. Always go with 'restart service.'

Free Windows Admin Tool Kit Click here and download it now
March 17th, 2015 4:06pm

Hi

Sometimes the VMMS.exe process (= Hyper-V Virtual MAchine Management Service) fails to restart.

Like Cheesewhip already suggested, it is safer to restart the WMI service when all VMs are migrated to other hosts. Although, restarting the VMMS.exe process can be done without downtime for running VMs (starting up offline VMs will not work when VMMS.exe is not running), when it does not restart, I haven't found a solution yet to force it to start and most of the time, only a reboot fixes a 'no-start' of the VMMS.exe process.

When you're trying to restart the WMI service, how heavily is the Hyper-V host occupied in terms of RAM? Is it running on its maximum or are there still resources left?
The Hyper-V management OS reserves RAM automatically to make sure the management partition can run smoothly, but I've seen some issues in the past when the Hyper-V host runs on its max regarding RAM (most of these issues were on older Hyper-V versions though)

Regards
Stijn

March 17th, 2015 4:16pm

Hi Folks 

Here is a quick script that I use on my 2 node cluster . Perhaps someone could improve it to implement live migration function (using failover cluster module) 

Import-Module virtualmachinemanager
get-vmmserver vmmserver
IF (Get-SCVirtualMachine | ?{$_.StatusString -eq  "Unsupported Cluster Configuration"} )
{
Write-Host " YES"
get-service winrm -ComputerName Host1 | Restart-Service -force

get-service winmgmt -ComputerName Host1 | Restart-Service -force

sleep -Seconds 10
get-service vmms -ComputerName Host1 | start-Service 
get-service UALSVC -ComputerName Host1 | start-Service 
get-service SCVMMAgent -ComputerName Host1 | start-Service 
get-service iphlpsvc -ComputerName Host1 | start-Service
get-service CcmExec -ComputerName Host1 | start-Service
get-service winmgmt -computerName Host1 | start-Service
#Refresh-VMHost Host1
sleep -Seconds 20

get-service winrm -ComputerName Host2 | Restart-Service -force
get-service winmgmt -ComputerName Host2 | Restart-Service -force

sleep -Seconds 10
get-service vmms -ComputerName Host2 | start-Service 
get-service UALSVC -ComputerName Host2 | start-Service 
get-service SCVMMAgent -ComputerName Host2 | start-Service 
get-service iphlpsvc -ComputerName Host2 | start-Service
get-service CcmExec -ComputerName Host2 | start-Service
get-service winmgmt -computerName Host2 | start-Service
sleep -Seconds 1000


Refresh-VMHost Host2
sleep -Seconds 250

get-scvmhostcluster | read-scvmhostcluster
sleep -Seconds 60
get-scvirtualmachine | Read-SCVirtualMachine
}

Free Windows Admin Tool Kit Click here and download it now
March 17th, 2015 4:44pm

Each host's total RAM utilization is only about 60%. 
March 17th, 2015 8:23pm

Hi I have had no issues after restarting the WMI services and related services, however I have noticed that in VMM I have had to select one of the VM's that have "Unsupported Cluster Configuration" and do a repair, then refresh all VM's.  This works each time.
Free Windows Admin Tool Kit Click here and download it now
March 17th, 2015 9:58pm

I have tried the repair option before. When I select the repair option it gives me a list of 3 repair methods in which the retry and undo options are grayed out, therefore unavailable for selection. The only option left is "ignore". When I select it, the repair will refresh the VM but not fix the issue. 

I have noticed that all of my virtual switch information under properties of each of my host is missing. This seems to be the underlying problem that results in the "unsupported cluster configuration" message. If I restart the hosts the virtual switch settings shows back up. I cannot figure out why these settings just decide to disappear occasionally. 

March 18th, 2015 2:27pm

Repair?! repair option will do nothing in this case. So far, only restarting the WMI service (hence auto restarting 5-6 other dependent services) or vacating and rebooting the server will fix this issue. 

Your second paragraph is why I started this thread.

Free Windows Admin Tool Kit Click here and download it now
March 18th, 2015 2:31pm

Yes I am aware that the repair option will not work in this case. I was replying to Drakie's response about running a repair just so everyone was clear that the repair option will not work in this case. Back to square one. 
March 18th, 2015 2:40pm

Right, Right. Sorry. I missed what Drakie said about doing a repair before refreshing.. ?!

Yeah, repair does not work. Refreshing each virtual machines on the hosts have to be run if you don't want to wait around for VMM to do its auto refreshes after the host reboots or its wmi restarts. 

Free Windows Admin Tool Kit Click here and download it now
March 18th, 2015 2:45pm

Just so everyone is clear about what I meant by a repair.

The process I have had to do is:

Restart WMI on all my hosts

Refresh each host in VMM

Repair one of the VM's selecting ignore

Refresh all VM's on a host.

The latest patch that MS support has asked me to put onto my system is KB2790831, still waiting to see if it has fixed the issue.

March 22nd, 2015 10:28pm

Drakie,

The patch you mentioned (KB2790831) will NOT fix this issue.  We installed his patch on all of our Hyper-V hosts (Running Windows Server 2012 w/ Hyper-V role) back in November of 2014 and we are still experiencing this issue.  Just an FYI.  We have also had a case open with Microsoft since September of 2014 with no resolution yet.

Free Windows Admin Tool Kit Click here and download it now
April 8th, 2015 3:19pm

Anthony,

You are correct it did not fix the issue.  I now have to give them more logs and info on this issue.  Good thing is now we have a MS consultant that is having the same issue on a system that MS are installing so perhaps we may get some traction.

April 8th, 2015 11:04pm

I have updated the script it now restarts the services, refreshes the hosts, repairs a VM, then refreshes all VM's.

########################################################################################################
#
#  WMI Restart
#
#  This script must be run on the VMM server, it restarts the WMI service and dependency services
#
#  It then refreshes echo Host, repairs one of the Unsupported Cluster Guests, then refreshes all VM's
#

#  Date:  14/04/2015
#
#  Author:  Andrew M Drake (Thales Australia)
#
########################################################################################################

Function Restart-Services ($ClusterNode, $WINServices, $Services)
    #
    #  Restarts the required WMI service and dependant services on each host
    #
    {
   
    foreach ($Service in $Services)       
        {
        get-service -name $Service -ComputerName $ClusterNode | Stop-Service -Force -WarningAction SilentlyContinue
        }
   
    sleep -Seconds 10

    foreach ($WINService in $WINServices)   
        {
        get-service -name $WINService -ComputerName $ClusterNode | Restart-Service -Force -WarningAction SilentlyContinue
        }
   
    sleep -Seconds 30

    foreach ($Service in $Services)   
        {
        get-service -name $Service -ComputerName $ClusterNode | Start-Service -WarningAction SilentlyContinue
        }
   
    sleep -Seconds 120   
    }


Import-Module virtualmachinemanager
$Clustername = (Get-SCVMHostCluster).ClusterName
$ClusterNodes = (get-scvmhostgroup).Hosts.name | Sort-Object
$VMMServer = $env:COMPUTERNAME
$WINServices = "Winmgmt"#,"WinRM"
$Services = "VmHostAgent","vmms","UALSVC","CcmExec","iphlpsvc","SCVMMAgent"#,"Winmgmt","WinRM"
#
#  Gets a list of all the VM's that are in the Unsupported Cluster Configuration State
#

$UnsupportedVM = (Get-SCVirtualMachine | ?{$_.StatusString -eq  "Unsupported Cluster Configuration"})

If ($UnsupportedVM -eq $Null)
    {
    Write-Host "Cluster is in a Supported State"
    Exit
    }


If ($UnsupportedVM[0].StatusString -eq  "Unsupported Cluster Configuration")
    {
    foreach ($ClusterNode in $ClusterNodes)
        {
        Write-Host "Running on Server: " $ClusterNode
        #
        # Restarts the services and Refreshes the host
        #
        Restart-Services $ClusterNode $WINServices $Services
               
        Refresh-VMHost $ClusterNode -WarningAction SilentlyContinue
        }

    #
    #  Dismissies the last task for the first VM in the unsupported cluster configuration list - IE Does a repair ignoring failed state
    #
    Write-Host "Starting repair on VM: "  $Unsupportedvm[0]

    Repair-SCVirtualMachine -VM $Unsupportedvm[0] -Dismiss -WarningAction SilentlyContinue

    Write-Host "Refreshing all VM's on Cluster: "  $Clustername

    foreach ($VM in $UnsupportedVM)
        #
        #  Refreshes all VM's in the unsupported cluster configuration list
        #
        {
        Refresh-VM -VM $VM -WarningAction SilentlyContinue
        }

    }

Hope this helps anyone. 

Free Windows Admin Tool Kit Click here and download it now
April 14th, 2015 12:09am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics