Weird Domain Controller lockup issue
I have two domain controllers in one site on my network. One is a physical server, the other virtual (VMWare ESXi 5). We're seeing some strange lockup issues on the VM server. When the problem happens, RPC communication to the virtualized DC (call it DC2 for convenience) is problematic. You cannot make a remote Server Manager connection to the computer, and users who might happen to try and authenticate with DC2 cannot do so. Techs are usually the first to notice this: they can't open AD Users and Computers or something like that. The only indication on DC2 that something has gone wrong is a repeated 1007 Group Policy error in the System log: "Processing of Group Policy failed. Windows could not determine the site associated for this computer". This will happen every 7 minutes, as the DC repeatedly tries to process it's GP's, fails, and waits for 5 minutes to try again. I cannot find any indication that of something that might have lead to this problem, whether from the App/Sec/Sys logs, or some of the other Apps and Services logs like DNS, File Rep, etc. It's not a connectivity issue, or at least not fully. The server can still be pinged, we can still get logged onto the Remote Administrator console. It's not a firewall issue; Windows Firewall is disabled on DC2 and nothing is changing with any router ACL's. You reboot the server and all is well with the computer again, for a seemingly random amount of time. We've documented this happening at various times, anywhere from 3am to 10pm at night. There's not a set amount of time between instances; sometimes it will go fine for a week or more, other times it will happen twice in the same day. The problem never resolves itself on it's own: A couple times this has happened over a weekend and no one has been around to notice until Monday morning, and it'll turn out this has been going on since Friday night. Since we have another DC in the site, it's not always immediately noticeable that this has happened, unless we keep close watch on DC2. Aside from the other DC in the same site, we have several other Sites that all have their own DC's. All of the DC's have the same Group Policies applying to them, none of them have ever exhibited this issue. We recently migrated this server from VMWare 4 to VMWare 5, the problem followed. The DC is not synching time with the VMWare server; getting it from the other DC. There aren't any DNS errors, save for when the problem is happening, then you get a generic "DNS Server unable to open Active Directory" 4000 event. The server is Server 2008 R2, same as all the other DC's. We have only the one domain in a single forest, both at Server 2008 R2 level. Replication seems to work fine, at least when the server is not exhibiting this error. DCDiags run when things are normal do not report any serious issues. The DC isn't doing anything exotic like certserv or KMS or anything like that. We just want it to be a fat, dumb, and happy DC. Any suggestions/thoughts on things to look at would be appreciated. We're getting to our wits end with this issue.
June 26th, 2012 10:44am

If you are running the E1000 network driver, try to replace it with the vmxnet driver.
Free Windows Admin Tool Kit Click here and download it now
June 26th, 2012 11:24am

Agree with C. Pfeiffer. Its better to use VMXNET 3 as a Virtual NIC on DCs, my personal experience says that vmxnet 3 is much faster than E1000 and safe to use. Which NIC for Windows 2008? E1000 or VMXNET 3? http://communities.vmware.com/thread/212090 Choosing a network adapter for your virtual machine http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1001805 Also, uninstall and re-install VMware tools on problematic DC and see if that makes any difference. Press any key... What the ... Where's any key ? This posting is provided "AS IS" with no warranties or guarantees and confers no rights. About Me ?
June 26th, 2012 12:08pm

Hello, for starting i would like to see an unedited ipconfig /all from the site DC/DNS servers, so we can verify some settings. As you are talking about site problems error message, please assure that both DCs are listed within AD sites and services in the correct site and that all subnets are added to the correct site. For time sync with VMWare machines please follow http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1189 to exclude the time sync to host machines.Best regards Meinolf Weber MVP, MCP, MCTS Microsoft MVP - Directory Services My Blog: http://msmvps.com/blogs/mweber/ Disclaimer: This posting is provided AS IS with no warranties or guarantees and confers no rights.
Free Windows Admin Tool Kit Click here and download it now
June 26th, 2012 5:25pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics