All Hyper V VM s are using 100% CPU

All

We have a windows server 2012 as a hyper v host and 2 VM S SBS 2011 and windows 7 VM

Since this Tuesday the CPU usage is a constant 100% on both VMs and since backups were failing due to high cpu usage rebooted the SBS 2011 server. But even after a reboot/shutdown SBS is at a constant 100% usage

We disabled all non MS services, disabled Exchange, SQL and memory hungry services but the CPU usage is at a 100%.The time it goes down is when we stop all MS /NON MSservices except bare minimal services which needs to run the SBS properly

Even with this the moment we open up a MMC console , open up an application on the server the cpu goes to 10% and remains for a long time and then goes down again

But with all SBS services running it never happens and stays at a 100% even after acouple of hours

Hyper V integration services are up-to-date

Any advice is greatly appreciated.

Thanks

Dhanushka



  • Edited by PCS-Support Friday, August 16, 2013 7:51 AM
August 16th, 2013 7:50am

Hi,

When the CPU usage at the high level, you can use the resource monitor to locate which process cause it and check the related issue.

Using Resource Monitor to Troubleshoot Windows Performance Issues Part 1

http://blogs.technet.com/b/askperf/archive/2012/02/01/using-resource-monitor-to-troubleshoot-windows-performance-issues-part-1.aspx

Hope this helps.

Free Windows Admin Tool Kit Click here and download it now
August 17th, 2013 3:37am

Hello Alex

Thanks for your reply.

The issue is its not just one or a group of processes/services.For an example initially it was Symantec Mail security related processes/services.So we disabled those services but after a reboot a new set of processes/services were using the CPU cycles.If we disable those then another processes or services takes over.Sometimes even taskmanager uses 30-40% of CPU.Also its not a single process or a service.Its always at least 3-4 services collectively using all available CPU cycles.

We managed to bring the CPU usage down to 50-60 percent by disabling all non essentials (SQL instances , windows search , Windows internal database etc ) on the SBS server VM as well as on the Hyper V  server but as mentioned earlier when ever a new MMC , application or even when the start button is clicked on the SBS server CPU usage goes to 100%

Dhanushka

August 17th, 2013 10:21pm

Hi,

Base on my experience it maybe cause by the virus or common system file resource contention or the system disk have bad block, could you post some detail system log when the issue occur.

T

Free Windows Admin Tool Kit Click here and download it now
August 19th, 2013 10:14am

Hi Alex

There is no time frame and if we enable all  services on the SBS and Hyper V it keeps using cpu forever.

There is nothing unusual on any of the event logs .

At this stage it could be related to common system file resource contention because of the nature of the issues , because its impacting all VM s and because th CPU load goes down when some services are disabled.

Any thought  on how to get more details since there is not much help rom the event viewer?

Thanks agan

Dhanushka

August 20th, 2013 6:21pm

How much memory does the host have? VMs have?

How is it assigned?  Are you using dynamic memory?

What's the disk subsystem like, specs, etc?

CPU specs?

Free Windows Admin Tool Kit Click here and download it now
August 20th, 2013 9:26pm

Host is having 32GB of RAM. 16GB assigned to the SBS VM and 4GB assigned to the windows 7 VM.

Memory is not allocated dynamically.

SBS VM has fixed VHD drives ( C attached to IDE and the size is 200GB and 100 GB of free disk space , D attached to SCSI controller and its 400GB and 200GB remaining free). windows 7 has one differencing VHDX drive.

CPU on the host is a Intel Xenon CPU E5-2430.0 2.20GHz

August 21st, 2013 7:06pm

 The problem you have is one of 2. 

Firstly, i smell virus on your server/VMs - I will power off the exsisting VMS, then  quickly create a fresh Win7 VM on Hypv ,power it on and monitor the CPU utilization

Secondly, I also think its a hardware problem. Check that your physical server is not too hot /The internal FAN is still functioning (This might be the problem) 

Regards

MassonTech


Free Windows Admin Tool Kit Click here and download it now
August 21st, 2013 7:30pm

Dell server administrator is not detecting any hardware issues from the very first day which this issue occured.

Also I dont think this is an infection because since Friday afternoon everything appears to be back to normal.No cpu spikes or constant CPU usage.Enabled all disabled services on the SBS server  and rebooted the other VM  but no issues till now.

We are yet to reboot the SBS VM and the Hyper V itself  but we just want to keep the SBS running as it is for the next few days and check the state.

Dhanushka

August 26th, 2013 7:35pm

We are having an almost exact problem, including using a Dell server. The server is a Dell T320 with a H710 caching RAID controller with 15k SAS drives. 

My vote is for a hardware problem. We have the 'luxury' of having had this same box used as a 2008R2 Hyper-V host with a SQL server VM which had slowness issues but nothing that we could put our finger on. We tested and tested some more with no discernible hardware problems. 

Finally we were under so much pressure to do something, we reversed the roles with another T310 which was just running SBS 2011 physical, putting Server 2012 with the Hyper-V role to the T320 along with SBS and the Terminal Server as VMs. We reformatted the T310 and installed SQL server on it. Now the server that had SBS on it before is running SQL with no issues while the T320 which had issues with SQL is now having issues as described above with SBS. Even the RDS server is using 70% of CPU at rest.

It doesn't matter which service is stopped the others expand to fill the void. It is like removing a balloon from a box of balloons! You still have a full box...

It feels like something might be wrong with the processor or the Hypervisor bit (ring 0) implementation. I am at the point of calling Dell and exercising the Lemon Law. We all know how easy that will be.

I just wish there was some way to troubleshoot a problem like this.

Free Windows Admin Tool Kit Click here and download it now
August 30th, 2013 2:02pm

Many thanks for letting everyone know your experience and I can understand your frustration, we went through the same.

Ours is also a Dell T420 PE so that is a common denominator is this issue.

The server in question belongs to one of our client and we did not have a similar spec ed server so we could not perform what you have but its really intriguing that even after a format and installation of a complete different OS that the same issues remain. I too believe its hardware related but I guess it could be a combination of hardware and windows 2012.

We were about to call MS for support but the issue suddenly vanished and sbs and other VM is running fine for a couple of weeks now.

As you have mentioned wish there is a better way to troubleshoot issues like this, especially the servers in production.

September 5th, 2013 1:39pm

I am experiencing the exact same issue.  In my case the host and guests are 2008R2.  All was running fine until last week when users began to mention slow performance in some of virtual machines.  I have four virtual machines (all 2008R2 except for one XP machine, but that was not running during this testing).  At first it was just one machine (an Exchange 2010 server).  I suspected a virus or outside attack, but firewall blocking and scans have ruled that out.

I opened a ticket with MS, but specific to the one virtual machine. They don't have an answer yet, but asked me several times if it was limited to the one machine -- at the time it seemed to be, but now a second virtual machine is experience the same.  As Dhanushka said, the high CPU processes seem irrelevant -- you kill or disable one and another, seemingly legitimate process takes it's place. 

I restored from from a backup to a VMWare workstation and the machine was fine (10% utilization).  I restored the same backup to my Hyper-V host again again 100% utilization.  I am confident this is something in the Hyper-V, possibly driver or update related.  If anyone has anymore information, these are production machines and any help is appreciated.

Sean

Free Windows Admin Tool Kit Click here and download it now
September 19th, 2013 3:20pm

Hi Sean

Sorry to hear that you are experiencing the same nightmare as myself.

The physical server , is it a DELL ?If the answer is yes then we may have a common denominator. Also is the firmware on the DELL server up to date? If you can get the support from the hardware vendor and check if firmware bug on  RAID/SAS/SCSI etc. could be causing the problem.

I do not have any instructions or advise because till now we don't know how it occurred or how it got resolved. We managed to bring the system to a workable state by disabling all but absolute minimum critical system related services on both the host and the guests. I don't know if that's a possibility on your situation or if you already exhausted that without any success.

But unlike myself you are in a better position  since you have an open ticket with MS. Do let us know how things proceed and the actions taken to resolve the issue.

Best of Luck.

Dhanushka

 

September 20th, 2013 8:07am

Hello Dhanushka,

Yes, in fact I am running on a Dell PowerEdge R520.  Three of the four virtual machines have been running for almost one full year, however.  The problem only occurred fairly recently.  I need to check -- it is possible that I recently made some updates -- I will check into that.  This weekend will be my big troubleshooting weekend - I'll need to get it stabilized somehow by then.

Sean

Free Windows Admin Tool Kit Click here and download it now
September 20th, 2013 11:58am

I believe I have solved my problem.  Yesterday I stopped my virtual clients and focused on the host - a Dell PowerEdge R520 with a PERC H310 mini RAID controller and dual Broadcom NICs.  There were no significant errors in the Windows logs nor any in the Dell OpenManage Administrator. Although not terribly out of date, I chose to upgrade the BIOS, NIC Firmware, NIC Drivers, PERC Firmware and PERC Drivers.  In addition, although up to date, I reinstalled the Chipset drivers.

After two restarts I fired up one of my virtual clients -- a 2008R2 Server running SQL Server 2008R2.  It was running at about 60% CPU before the issue, but was sluggish.  Idle CPU usage is now averaging about 2%.

I then started my Exchange server (2008R2/2010) which had been pegging at 100% with all services enabled.  It is now averaging about 3% idle.  Lastly, I started my 2008R2 Remote Desktop Server which had also been running at 90 to 100% and was extremely slow. Also using about 2% now.

So, after 24 hours, looking like we are back to normal.  Unfortunately, I did could not afford the downtime to check the fixes one by one. My guess is it was either the RAID (PERC) firmware or drivers or maybe the NIC firmware/drivers.  I also don't know why it started... possibly a Microsoft update that was incompatible with a Dell driver?  My suggestion is to ensure you have the latest firmware *and* drivers for your Hyper-V host.  You might also trying reinstalling the Integration Services on the client, but in my case it seemed to be host-based.

Sean

  • Proposed as answer by MrSeanK Sunday, September 22, 2013 5:03 PM
September 22nd, 2013 5:02pm

Hi Sean

Great to hear and thanks for shareing your experience.

I too  believe that there may be an incompatability with windows and dell firmware for this issue to occur in the first place but having said that how my issue got resolved without any firmware update is a complete mystery.

Regards

Dhanushka

Free Windows Admin Tool Kit Click here and download it now
September 23rd, 2013 5:51pm

I had very similar issues and I can also say that this was a fix for me. Dell T420. I did everything but the RAID(bios, chipset, network, firmware/drivers)
April 2nd, 2014 4:18am

The fix for me was to change the System Profile Setting in Dell Openmange Server Administrator it is set to Performance Per Watt (DAPC) by default Change it to Performance and reboot the server.  Performance issue fixed!!
Free Windows Admin Tool Kit Click here and download it now
January 21st, 2015 4:05pm

Walter, are you referring to disabling C state in BIOS using OMSA ?

Regards,

Dhanushka

February 4th, 2015 7:35pm

I've had this issue ongoing for a while on my T620. We did the iDRAC update, BIOS update... worked for 3 weeks or so and bam, high CPU on the virtual machines.

Last night, Dell had me go into the BIOS and change the Performance Per Watt (DAPC) to just PERFORMANCE and the tech was almost certain this will totally resolve it. Will let you know.

Free Windows Admin Tool Kit Click here and download it now
May 10th, 2015 10:30pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics