Windows 2012 R2 sometimes hangs at splash screen after reboot (Network Steve Forum)

Windows 2012 R2 sometimes hangs at splash screen after reboot

Hello

Sometimes, Windows 2012 R2 servers hangs at splash screen (spinning dots) and never boot. They are virtual machine, installed on ESXi 5.5. To resolve this issue, we just have to reset the VM, then Windows boot normally.

All of our servers are affected. No memory dump is generated and there nothing is wrong in event viewer. Any ideas?

September 18th, 2014 10:23am

Hi,

where you have checked event? whether in physical machine or guest? better you have a check on physical machine events. and make sure that in physical machine all the drivers are installed properly...

Zia

Free Windows Admin Tool Kit Click here and download it now

September 18th, 2014 1:12pm

Hi kompakt,

First please keep your server up-to-date .

If the issue persists (hangs sometimes ) , I would suggest you try to contact with WMware :

https://communities.vmware.com/welcome

http://partnerweb.vmware.com/GOSIG/Windows_Server_2012.html

Best Regards

Elton Ji

September 22nd, 2014 6:32am

Hello,

exact same situation over here.

Fully up to date vSphere 5.5 infrastructure. Only Windows 2012 R2 VMs affected. 2012 R2 running on physical hardware never showed this behavior. This happens after a reboot initiated after installing patches once a month (using LANdesk patch manager).

Regards,

Andreas

Free Windows Admin Tool Kit Click here and download it now

October 7th, 2014 9:29am

We are currently experiencing the same issue with our Server 2012 R2 VMs. The VM's will be struck at the windows splash screen and the only way to fix them as of right now is to power them off completely and power them back on. We are running vSphere 5.5 with Update 1 and all of our VM's are running the latest guest tools.

October 21st, 2014 6:01pm

Unfortunately I have the issue too. We use System Center 2012 to deploy the patches. We are also on VMware 5.5.

Free Windows Admin Tool Kit Click here and download it now

October 29th, 2014 6:16pm

I just created a case with Microsoft this afternoon and I am waiting to hear back from them. We use SC 2012 also for patch deployment. In most of our cases, this happens after a patch deployment however, I can't duplicated this issue so far with a reboot of the VM itself. The most recent event was when a DBA was installing additional SQL Roles on a Windows Server 2012 VM and a reboot was required to finished the install. The VM had to be powered off and powered back on again.

I have also created a case with VMware and so far, they have turned up nothing.

October 29th, 2014 6:26pm

We have the exact same issue. VMWare ESXi 5.5 Update 1 and Update 2 servers. The windows VM servers are patched by WSUS with 4 different patching times/groups. We have over 160 VMs, of which 25+ are now Windows 2012 R2 Servers. All VMs patch correctly, but a RANDOM PORTON of the Windows 2012 R2 servers fail to complete their boot after patching (6 last patching cycle). They hang at the spinning circle of dots on the boot screen.

I have so far not been able to track this one down. They seem to be hanging very early in the boot cycle - early enough that the volumes (drives) are not marked as dirty when you 'reset' the VM (i.e. Data Protection Manager does not need to run a consistency check over the volumes at next boot-up).

I 'suspected' this is to do with heavy I/O on the underlying datastores, but have not been able to prove it. I have moved a Windows 2012 R2 VM to a separate LUN that I then generated lots of I/Os against the LUN whilst I rebooted the VM. The datastore latency went up to 160ms+ but the VM still rebooted just fine... It doesn't rule out latency but I just can't prove it...

Another option I have considered, but haven't tried yet, is to replace the virtual LSI controller with VMware ParaVirtual. Its not my standard, but if it is a bug in the LSI driver it would get around it. ParaVirtual driver comes with caveats for MS Clustered VMs.

Will be watching this thread with interest... There is definitely something wrong here. And I know I will be battling more and more Windows 2012 R2 that fail their late night patching cycles as the weeks go by. :-( So much for 'automated' patching...

Free Windows Admin Tool Kit Click here and download it now

November 1st, 2014 3:37am

I have a Class C ticket with Microsoft and VMware. VMware is still looking into all of the logs that I sent to them. I hate to say it but, the Microsoft engineer isn't much of help. He states that there is very little that he can do since it isn't consistent. I been busy fighting other things at work and haven't had the time to argue with him about gathering more data. That is interesting about the VMware parascsi controller and the data store. I will check into that on Monday. Maybe there is a correlation with were the vm are located at within all of the data stores. I check the vm having this issue and I was thinking that maybe they are using the wrong hardware profile assignment. That isn't the case. So far I haven't been able to duplicate the issue at all.

November 1st, 2014 2:09pm

I also have opened a premier support call with Microsoft and they recommended I turn on boot logging and capture the memory through VMware snapshots when the failures happens. They will then analyze it. Hopefully next patch cycle we will have some more failures and they can find something.

Free Windows Admin Tool Kit Click here and download it now

November 5th, 2014 3:45pm

We have the same issue - I have been researching this since April. I thought it might be related to the Automatic updates Microsoft saw fit to turn on - however I have been unable to find any common thread among the configurations of the settings. Thanks for all the postings - will be watching with interest.

November 5th, 2014 9:25pm

GJMFL could you tell us a little about your environment, is it Vmware 5.5 or something else? Maybe we can find the common thread.

Free Windows Admin Tool Kit Click here and download it now

November 5th, 2014 9:29pm

VMWare ESXi 5.5 Update 1 - Server 2012 R2. I am looking at everything - other users who never logged off - automatic update settings (we use a WSUS server) - although it happens sometimes when I go out to Microsoft update site. I have been trying to test various scenarios to find something.

November 6th, 2014 1:07pm

I have been on PTO this weekend where I work so I haven't been checking my email until this morning. VMware finally has gotten back to me. There is a bug with ESXi and their engineers are working on a fix. VMware suggests making one of the changes below. If anyone implements any of these changes, please let me know if it does or doesn't work.

Starting with Windows 8 / Windows 2012 Server, during its boot process the operating system will reset the TSC (TimeStampCounter, which increments by 1 for each passed cycle) on CPU0. It does not reset the TSC of the other vCPUs and the resulting discrepancy between two vCPUs' TSC can result in the OS not booting past the Windows splash screen, and a full power off and on will fix it.

Our engineering team here are currently working on a code change to accommodate this.
There is a workaround suggested from engineering to add a line of code to the vmx (configuration) file of the VM to prevent this from reoccurring.
This will basically tell the vmx file that the TSC for all vCPUs should be reset to zero on a soft reset of the machine, and not just CPU0.

Please note that this has not been tested extensively by engineering, and should be run at your own risk as it is just a workaround which has not been fully QE tested.

This can be done a few ways:

First method: Manually editing the VM's vmx file one VM at a time.
1. Power off the VM
2. Add the following line to the vmx file:
monitor_control.enable_softResetClearTSC = "TRUE"
3. Reload the VM
4. Power on the VM again.

Second method: Doing this to every VM on a host at one time.
1. SSH to the ESX host
2. Run the following command:
echo 'monitor_control.enable_softResetClearTSC = "TRUE"' >> /etc/vmware/config
3. Run the following command to do a suspend-resume in order to apply the setting so that affected guests won't hang during the next reboot:
vim-cmd vmsvc/getallvms | sed -n 's/$^[0-9]\+$.* windows8.*Guest.*$/\1/p' | while read vmid; do state=$(vim-cmd vmsvc/power.getstate ${vmid} | sed -n 's/^.*$Powered on$.*$/\1/p'); if [ "$state" ]; then vim-cmd vmsvc/power.suspendResume ${vmid} && sleep 5; fi; done;

Last method: Using PowerCLI to do this to every VM in the environment.
Open PowerCLI, connect to vCenter server and run the following command:

    ForEach ($vm in (Get-VM)){
    $vmv = Get-VM $vm | Get-View
    $name = $vmv.Name
    $guestid = $vmv.Summary.Config.GuestId
    $state = $vmv.Summary.Runtime.PowerState
    $vmx = New-Object VMware.Vim.VirtualMachineConfigSpec
    $vmx.extraConfig += New-Object VMware.Vim.OptionValue
    $vmx.extraConfig[0].key = "monitor_control.enable_softResetClearTSC"
    $vmx.extraConfig[0].value = "TRUE"
    if ($guestid -like "windows8*Guest") {
    ($vmv).ReconfigVM_Task($vmx)
    if ($state -eq "poweredOn") {
    $vmv.MigrateVM_Task($null, $_.Runtime.Host, 'highPriority', $null)
    }
    }
    }

Note:
If you are using Solaris VMs in the environment, do not run this against those Solaris VMs as they could potentially hang with that setting in the vmx.
Also, when the script is running, do not do a vmotion, suspend, clone, or snapshot operation at the same time - this is very important, as it could cause the script to fail.

From looking at the logs, it seems like you are not running Solaris as an OS anyway, at least on these 2 hosts:
rhayden@scripts-prod-3 HostLogs29thOct $ find esx*/vmfs/volumes/ -maxdepth 3 -name "*.vmx" -exec grep 'guestOS' {} \; | awk '{print $NF }' | sort | uniq -c
      5 "longhorn"
      1 "longhorn-64"
      1 "rhel6-64"
      1 "sles11-64"
     54 "windows7srv-64"
     19 "windows8srv-64"
      1 "winnetenterprise-64"
      7 "winnetstandard"
      1 "winNetStandard

If after applying these settings to the VMs this does not work after the next patching / updating (you are still seeing the issue), what we would need to do at that point is get the suspended state file for the VM to send to engineering, as we cannot reproduce this issue in-house.

If this occurs, this is how to gather the information we would need to send to engineering:
(Do not reboot the VM's until this is done)

1. SSH to the host and run the following command:
vm-support --listvms

2. Now run this command:

vm-support --performance --manifests="HungVM:Coredump_VM HungVM:Suspend_VM" --groups="Fault Hardware Logs Network Storage System Userworld VirtualMachines" --vm="</vmfs/volumes/path/to/virtualmachine.vmx>"

(Change the path of the VM in the command above to the actual path).
That will put a tgz file in /var/tmp. The file name is displayed when complete. Copy this file off the host manually.

Edited by Chris Bonsted Friday, November 07, 2014 10:06 AM
Proposed as answer by MJMorris Thursday, May 14, 2015 1:33 PM

Free Windows Admin Tool Kit Click here and download it now

November 7th, 2014 10:05am

Edited by Chris Bonsted Friday, November 07, 2014 10:06 AM
Proposed as answer by MJMorris Thursday, May 14, 2015 1:33 PM

November 7th, 2014 10:05am

Chirs,

Do you have any update from VMWare on this? We have a bunch of servers experiencing this problem.

Free Windows Admin Tool Kit Click here and download it now

November 11th, 2014 4:32am

Nathaniel, Other than the 3 workarounds that they suggested I make, no. I have inquired about when a real fix will be created and I haven't heard back as of yet.

November 11th, 2014 9:14am

Chris,

Have you implemented any of the workarounds or are you waiting for an update? Please keep us posted on VMWare response please.

Free Windows Admin Tool Kit Click here and download it now

November 12th, 2014 7:29pm

I plan on making these changes to 8-12 VM's tomorrow and wait to see what happens. We will be patching our QA environment over the weekend.

As of right now, there isn't a time frame if or when a VMware will create a patch for this.

November 12th, 2014 11:04pm

Sorry for the delay in posting my results.

I have updated 10 2012 R2 VMs with the changes in the .vmx file. None of them experienced any issues upon rebooting when they were patched. I am going to expand my sample in our QA environment to 20 - 30 VMs however, I am going to say that making those modifications did help.

There isn't a public KB article from VMware about this issue other than this one:

http://kb.vmware.com/kb/2082042

There is no ETA on a patch from VMware. I hope that this information helps.

Edited by Chris Bonsted Wednesday, November 26, 2014 3:01 PM

Free Windows Admin Tool Kit Click here and download it now

November 26th, 2014 2:54pm

Sorry for the delay in posting my results.

There isn't a public KB article from VMware about this issue other than this one:

http://kb.vmware.com/kb/2082042

There is no ETA on a patch from VMware. I hope that this information helps.

Edited by Chris Bonsted Wednesday, November 26, 2014 3:01 PM

November 26th, 2014 2:54pm

We had this issue several months ago and Microsoft pointed us to clock time mismatch. They suggested we go to our Hypervisor vendor. VMware did minor investigations and found nothing of course. We are currently back to the same issue again like your post. Finding your post here has made VMware release the internal document stating what you found, to us. It is still internal only to VMware and Microsoft.

We have applied the PowerCLI script to all of our servers and it does modify the VMX without any issue. The problem is you still need to do a reset or power off/on via the virtual power buttons in VMware. OS reboots do not work. So we are in the middle of scheduling outages for our 400+ 2012 servers.

Thanks again for the post. I will post what VMware gave me on the symptoms for this issue to happen.

Symptoms:
Under the following conditions, you are:
Running Windows 8 or 2012 Server or later as the guest operating system on the virtual machine
Running on ESXi 5.5 or later with virtual machine hardware version 10 (vmx-10)
The virtual machine has not experienced a full power cycle (powered off / powered on) for more than two months.
The virtual machine is configured with more than one vCPU.
You might see the following symptoms:
After rebooting, Windows 8 or 2012 Server virtual machines might hang during the Microsoft Windows boot splash screen

After resetting or power cycling the virtual machine, it will boot successfully.
The virtual machine might resume booting after multiple hours or days
A memory dump analysis might reveal thread blocking on a timer expiry hours or days in the future
The blocking thread might be stuck in KeDelayExecutionThread() during PciStallForPowerChange()

Cause:
Starting with Windows 8 / Windows 2012 Server, during the boot process the operating

Free Windows Admin Tool Kit Click here and download it now

December 1st, 2014 6:17pm

Thanks for all the info in this thread. I have the same problem using ESXi 5.1 and 2012 R2 servers. Has anyone experienced this problem using 5.1?

December 16th, 2014 12:46am

Has anyone received any updates from VMware on this? We are experiencing the same issues after Windows updates. Any issues reported with the proposed workarounds?

Thanks,

Derek

Free Windows Admin Tool Kit Click here and download it now

January 5th, 2015 4:47pm

Most likely this is low on VMware's radar. I have not heard on when a fix will be issued. We have implemented this work around in our QA/Dev VM's (about 100 of them) and we have not had any issues since the .vmx modifications where made.

January 6th, 2015 12:21am

I discussed this with VMware Support yesterday. Here's an official KB article, hot off the press:

http://kb.vmware.com/kb/2092807

It has a few details not yet discussed on this thread, so definitely check it out if you're affected by the problem.

Joe.

Free Windows Admin Tool Kit Click here and download it now

January 21st, 2015 1:30pm

I've also been told that the fix/workaround is proposed to be included with ESXi 5.5 Update 3 and ESXi 6.0 Update 1.

January 27th, 2015 2:52pm

Does anyone see this working in their environment? We have a few VMs that still hung on reboot with this applied. When comparing the Advanced configuration properties we noticed the script set the parameter as "monitor_control.enable_softResetClearTSC = TRUE" while other parameters show their values as "true". Not sure if the "TRUE" vs "true" makes a difference.

Free Windows Admin Tool Kit Click here and download it now

March 3rd, 2015 9:36pm

Machines in our environment also still hanging on with specified "monitor_control.enable_softResetClearTSC = TRUE" parameter. Maybe it requires server reboot to start applying this setting? In this case upcoming patching will show is it true or not.
Also think that there no difference between "TRUE" and "true".

Edited by Andrej Trusevic Wednesday, March 04, 2015 1:28 PM
Proposed as answer by MiliusXP Friday, March 13, 2015 1:36 PM
Unproposed as answer by MiliusXP Friday, March 13, 2015 1:36 PM

March 4th, 2015 1:23pm

Edited by Andrej Trusevic Wednesday, March 04, 2015 1:28 PM
Proposed as answer by MiliusXP Friday, March 13, 2015 1:36 PM
Unproposed as answer by MiliusXP Friday, March 13, 2015 1:36 PM

Free Windows Admin Tool Kit Click here and download it now

March 4th, 2015 1:23pm

I've applied the workaround, restarted hosts and still have the issue.

Lets wait and see what 5.5 U3 brings, no chance I'm touching ESXi 6 until U1 comes out, and when that does hopefully the fix will be in there too.

There is every chance it won't be though; when I previously looked in to this issue (maybe 3 or 4 months ago now); I was lead to believe it was a Microsoft fault rather than VMware, the argument for this was a good one and I am yet to see Microsoft admitting to anything.

March 9th, 2015 9:44pm

Requiert complete shutdown and restart

parameters are not case sensitive.

Free Windows Admin Tool Kit Click here and download it now

March 13th, 2015 1:37pm

Has anyone heard anything more on this issue? I have applied the setting change and it does not make a difference...

I have to reboot multple times to get my VMs to come up.

March 17th, 2015 1:47pm

Hi all,

Can someone please summarize this?

Doesn't the VMware KB 2092807 have the resolution? It doesnt solve this bug?
If i use the PS script, do i still need to restart my VMs?

(The scripts seems to do a "localhost" vmotion which should create a new vmx file?)

IF 9 2807

Free Windows Admin Tool Kit Click here and download it now

March 18th, 2015 3:41pm

Has anyone opened a case with Microsoft on this issue? Is anyone seeing this in Hyper-V, Stand Alone, Xen? VMware has reported this to be a Microsoft issue and are unable to find any problems on the vmware side on our system.

March 18th, 2015 4:39pm

I can confirm that VMware provided solution in KB 2092807 does not solve bug. Required parameter was set to all win2012 and win 2012R2 machines in our environment. All servers rebooted after that, but during this month's patching some of them still hangs

Free Windows Admin Tool Kit Click here and download it now

March 20th, 2015 7:29am

@andriktr
Did you use the ps script?

March 20th, 2015 7:54am

Yes, script was used for setting parameter
To set parameter without script you will be required turn off VM. It's not possible manually editing VM config and set this param when VM is turned on. Using script you can set param without turning off VM.

Edited by Andrej Trusevic Friday, March 20, 2015 10:30 AM

Free Windows Admin Tool Kit Click here and download it now

March 20th, 2015 10:23am

Edited by Andrej Trusevic Friday, March 20, 2015 10:30 AM

March 20th, 2015 10:23am

Yep, but I don't want to run the script on my production VMs if it doesn't solve anything.....

Free Windows Admin Tool Kit Click here and download it now

March 20th, 2015 12:05pm

Can now confirm that I also experience the same problem even though I applied the "fix" ......

March 20th, 2015 1:43pm

Can also confirm that we are seeing this issue, have been for months and finally know why. Going to be opening cases regarding this issue. Same behavior, after being online for about a month 2012 R2 servers will get hung during automatic patch reboot.

Free Windows Admin Tool Kit Click here and download it now

March 20th, 2015 5:44pm

I also created a ticket for VMWare support. Let's wait for the answer. :)

March 31st, 2015 7:27am

Already get feedback from vmware support. They said that the fix described in KB2092807 should work if not we need to collect logs from host where VM is in hang status and provide for them. Also they provided another workaround - downgrade VM HW to version 9.

The good news is that they also confirmed this problem will be fixed in 5.5 U3 which will be released between 2nd and 3rd quartal.

Free Windows Admin Tool Kit Click here and download it now

April 1st, 2015 1:41pm

Hello everybody,

kind of late to the party. We are running 2012 R2 on a physical machine using the Hyper-V role. This is a no-HA lab machine. We are having the same issues as described here, just with Hyper-V. Again, the HOST system is the Hyper-V server, guests are a mix of XP to Server 2012 R2.

I have no idea how I could apply any of the fixes described here to a physical machine.

Is there any news from Microsoft on this issue?

Regards,
Michael

April 2nd, 2015 12:54pm

For those that say the "fix" using the powershell script did not work....you did read in the KB that "The virtual machine(s) need to be shutdown and powered on for the changes to take affect.".

Was that done...or do you simply do a reboot of the VMs (which would not fix), or some say they rebooted the ESX Host (which is not the fix).

Free Windows Admin Tool Kit Click here and download it now

May 1st, 2015 12:46pm

What about reloading the VMX settings while the VM is running (Reference: http://kb.vmware.com/kb/1026043), and then restarting normally? Has anyone tried that? It seems to work for other settings that normally don't take effect without a full shutdown and poweron.

Also, we've seen this same exact behavior since we installed Patch 4 for McAfee VirusScan Enterprise 8.8. It's a known issue with Patch 4 (Reference: https://kc.mcafee.com/corporate/index?page=content&id=KB78495 - issue 1020874). Patch 5 is supposed to be released to the general public next week.

Edited by Random Anonymous Name Thursday, May 14, 2015 5:27 PM

May 14th, 2015 5:23pm

Edited by Random Anonymous Name Thursday, May 14, 2015 5:27 PM

Free Windows Admin Tool Kit Click here and download it now

May 14th, 2015 5:23pm

I don't think this is all the problem

it seems this case only happens after installed a(sepcial one maybe) update, normal reboot just fine

May 20th, 2015 9:44pm

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2092807&src=vmw_so_vex_mbrad_895

It looks like the issue happens only if the VM has not been powered off for more than two months.

This only applies to virtual machine hardware version 10 as Windows resets the TSC on all CPUs on virtual machines with older hardware versions (which do not support hypervisor.cpuid.v2).

Free Windows Admin Tool Kit Click here and download it now

May 21st, 2015 1:51am

pulling my hair out with this patch cycle and 2012 r2/5.5....found this thread, sorry that we are all having this problem but good to see I'm not the only one and going crazy. Found a workaround for all of the small environments. If you shut down the server and start it from vmware there is no problem. I find that better than "crashing" it everytime it won't boot...makes me a little nervous. I guess I'll just do this until u3 comes out.

Hope this helps someone out....have a good weekend.

June 19th, 2015 6:44pm

The ESXi VMs use an LSI controller isnt that hotfix needed addresses issues hanging with LSI controllers

https://support.microsoft.com/en-us/kb/2966870#/en-us/kb/2966870

Free Windows Admin Tool Kit Click here and download it now

July 29th, 2015 4:03pm

Just got bit by this. Also a VM on 5.5. I'm thinking of changing to the PVSCSI driver to mitigate, as we don't use this with any clusters. My fix was to RESET this VM in vCenter, and it came back. A bit nerve racking, as this is a FS with a lot of data.

August 15th, 2015 12:04pm

This topic is archived. No further replies will be accepted.