Server 2012 Hyper-V Host Not Responding

The issue has happened on 2 different hosts in the last two weeks, and we haven't been able to find out exactly what is causing this so I am hoping to get some help.

5 Node Server 2012 Hyper-V cluster.  Monitored using SCOM and also using VMM (2012 SP1).  In VMM we noticed that the server in question was showing as not responding, all the VM's were fine, and VMM shows that the WinRM service is not OK.  The service is running on the host, but the host cannot be RDP'd to, we can't login to the console, and we can't connect to the cluster via Cluster Manager because in both instances the host in question was the Cluster owner.  Our "solution" has been to shutdown the VMs on the host and then forcefully reset it, and after that everything is fine.  However, I am hoping to get to the bottom of what is going on.

This is the exact same issue as described HERE, but there isn't a solution given.  Here is a chronology of events that I gathered from the event logs.

From the logs you can tell that the issues started on 7/7/2013 at 5:38 PM

 

From Application Event Log:

Error 26001, Microsoft.SystemCenter.VirtualMachineManager.2012.Report.VPortUsageCollection

 

Got null results from Select Connection from Msvm_SyntheticEthernetPortSettingData where InstanceId='Microsoft:3E323714-F9A3-4384-A2D7-3466B3FED595\\6D14D559-7676-4D9B-82A7-F5F1199601FA'  .  Different instance IDs.

 

Basically happened every 30 minutes until 1:49AM on 7/9.  I rebooted the server around 11:45 PM that night.

 

From System Event Log:

Event 1, VDS Basic Provider

Unexpected failure. Error code: 48F@01000003

There are a lot of these errors.  These errors still occur even when the WMI Performance Adapter messages are appearing normally

 

7/7 5:26 PM

Event 7036, Service Control Manager

The WMI Performance Adapter service entered the stopped state.

 

7/7 6:06 PM

Event 7036, Service Control Manager

The WMI Performance Adapter service entered the running state.

 

7/9/2013 8:27 AM

Event 7000, Service Control Manager

The Device Setup Manager service failed to start due to the following error. The service did not respond to the start or control request in a timely fashion.

A lot of these errors

 

7/9/2013 8:35 AM

Event 7011, Service Control Manager

A timeout (30000 milliseconds) was reached while waiting for a transaction response from the SCVMMAgent service.

 

7/9/2013 8:46 AM

Event 7001, Service Control Manager

 The System Center Virtual Machine Manager Agent service depends on the Windows Remote Management (WS-Management) service which failed to start because of the following error:

The service has not been started.

 

Once the WMI Performance Adapter Service stopped stopping/starting again, in the Operations Manager log you start seeing warnings like these:

7/7/2013 5:30 PM

Event 21402, Health Service Modules

 

  • Module was unable to connect to namespace 'ROOT\MSCLUSTER'

 

  • Module was unable to connect to namespace 'ROOT\CIMV2'

 

  • Summary: 1 rule(s)/monitor(s) failed and got unloaded, 1 of them reached the failure limit that prevents automatic reload. Management group "MY_DOMAIN". This is summary only event, please see other events with descriptions of unloaded rule(s)/monitor(s).

 

  • Forced to terminate the following process started at 5:27:36 PM because it ran past the configured timeout 180 seconds.
  • Command executed:        "C:\Windows\system32\cscript.exe" /nologo "ConsecutiveSamplesTwoThresholds.vbs" A_SERVER_NAME

 

Followed by errors like this:

7/7/2013 5:37 PM

Even 22402, Health Service Modules

 

Forced to terminate the following PowerShell script because it ran past the configured timeout 30 seconds.

 Script Name:        PowerShellScript

One or more workflows were affected by this. 

Workflow name: Microsoft.Windows.HyperV.2012.VMReplicationHealth33412.Monitor

Instance name: A_SERVER_NAME

Instance ID: {EA9D5CBC-577D-C262-CBFC

 

Forced to terminate the following PowerShell script because it ran past the configured timeout 30 seconds.

 Script Name:        PowerShellScript

One or more workflows were affected by this. 

Workflow name: Microsoft.Windows.HyperV.2012.VMReplicationHealth33414.Monitor

Instance name: A_SERVER_NAME

Instance ID: {EA9D5CBC-577D-C262-CBFC-4F5037B38E50}

Management group: MY_MANAGEMENT_GROUP

The WinRM Service was running on the host (I was able to check it remotely using Powershell), I cannot say for sure about the SCVMM Agent Service, I don't remember.

July 10th, 2013 6:00pm

Hi,

Thank you for your question.

I am trying to involve someone familiar with this topic to further look at this issue. There might be some time delay. Appreciate your patience.

Free Windows Admin Tool Kit Click here and download it now
July 12th, 2013 5:22am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics