Windows Server 2008 R2 - 0x000000F4 BSOD - DFS
We currently run a two-node DFS cluster. The currently accessed server (the only one listed in the namespace) will crash consistently every 72 hours or so. If we switch out the node in the namespace with the other node, the now-active node will crash with
the same error.
The stop error is consistent on both nodes: 0x000000F4 (0x0000000000000003, Random, Random, 0xFFFFF800019D2510). No cause file is listed.
Both servers are virtual in a VMWare environment. ESXi 5.0u1 with latest VMWare tools.
I had thought that maybe KB2521220
might fix it, but the stop error is different (though the behavior is the same).
Any ideas? Thanks in advance!
July 18th, 2012 8:25am
First I would analyse the crash dump if you can, and most of all make sure you have all your gear uptodate on the firmware. (the server that host the hypervisor and maybe your storage array controller if you use's one (SAN, DAS, NAS, etc..))MCP | MCTS 70-236: Exchange Server 2007, Configuring
Want to follow me ? | Blog:
http://www.jabea.net | http://blogs.technet.com/b/wikininjas/
Free Windows Admin Tool Kit Click here and download it now
July 18th, 2012 8:52am
Check this...
You receive various Stop error messages in Windows 7 or in Windows Server 2008 R2 when you try to resume a computer that has a large SATA hard disk
http://support.microsoft.com/kb/977178
I suspect, issue is related to ESX Virtual SCSI Adapter type, which adapter is set on those VMs ?I do not represent the organisation I work for, all the opinions expressed here are my own.
This posting is provided "AS IS" with no warranties or guarantees and confers no rights.
- .... .- -. -.- ... --..-- ... .- -. - --- ... ....
July 18th, 2012 8:53am
I suspect, issue is related to ESX Virtual SCSI Adapter type, which adapter is set on those VMs ?
We use the LSI Logic SAN driver. It IDs itself as "LSI Adapter, SAS 3000 series, 8-port with 1068" in Device Manager.
Free Windows Admin Tool Kit Click here and download it now
July 18th, 2012 9:05am
I would tend to check your ESXi log too to isolate if it's a hardware issue affecting your VM (check in /var/log/)MCP | MCTS 70-236: Exchange Server 2007, Configuring
Want to follow me ? | Blog:
http://www.jabea.net | http://blogs.technet.com/b/wikininjas/
July 18th, 2012 9:17am
All right, LSI Logic should not cause issues. If buslogic has been used then that might have caused some issues.I do not represent the organisation I work for, all the opinions expressed here are my own.
This posting is provided "AS IS" with no warranties or guarantees and confers no rights.
- .... .- -. -.- ... --..-- ... .- -. - --- ... ....
Free Windows Admin Tool Kit Click here and download it now
July 18th, 2012 9:17am
Also as Yagmoth555 indicated, underlying storage might also cause such stop errors. If you are using SAN for storage, check the storage box firmware level and see if that needs an update.
I do not represent the organisation I work for, all the opinions expressed here are my own.
This posting is provided "AS IS" with no warranties or guarantees and confers no rights.
- .... .- -. -.- ... --..-- ... .- -. - --- ... ....
July 18th, 2012 9:21am
Also as Yagmoth555 indicated, underlying storage might also cause such stop errors. If you are using SAN for storage, check the storage box firmware level and see if that needs an update.
Our SANs (EMC Clariion) are maintained at the latest firmware level by EMC.
I'm going to install those hotfixes on the idle node and see if that makes a difference.
Free Windows Admin Tool Kit Click here and download it now
July 18th, 2012 9:37am
I'm going to install those hotfixes on the idle node and see if that makes a difference.
Before applying the hotfix, make sure to take a system state backup incase...I do not represent the organisation I work for, all the opinions expressed here are my own.
This posting is provided "AS IS" with no warranties or guarantees and confers no rights.
- .... .- -. -.- ... --..-- ... .- -. - --- ... ....
July 18th, 2012 9:40am
From: EMC CLARiiON Integration with
VMware ESX Server
(sorry for the image, the pdf does not allow me to copy/paste it)
Check in your SAN too if you have a I/O chart of that volume.
MCP | MCTS 70-236: Exchange Server 2007, Configuring
Want to follow me ? | Blog:
http://www.jabea.net | http://blogs.technet.com/b/wikininjas/
Free Windows Admin Tool Kit Click here and download it now
July 18th, 2012 10:13am
From: EMC CLARiiON Integration with
VMware ESX Server
(sorry for the image, the pdf does not allow me to copy/paste it)
Check in your SAN too if you have a I/O chart of that volume.
Thanks for that post, Yagmoth. We actually do dedicate an entire LUN to each node in DFS. These LUNs are not shared to anything else. I'm starting performance logging now.
July 18th, 2012 1:56pm
Hello,
Try proceeding like that:
Uninstall all unused programsRun chkdsk /r /f and sfc /scannowPerform a clean boot: http://support.microsoft.com/kb/929135Disable temporary all security softwares you have
If this does not help then use Microsoft Skydrive to upload dump files (c:\windows\minidumps). Once done, post a link here.
You can also contact Microsoft CSS for assistance.
This
posting is provided "AS IS" with no warranties or guarantees , and confers no rights.
Microsoft
Student Partner 2010 / 2011
Microsoft
Certified Professional
Microsoft
Certified Systems Administrator: Security
Microsoft
Certified Systems Engineer: Security
Microsoft
Certified Technology Specialist: Windows Server 2008 Active Directory, Configuration
Microsoft
Certified Technology Specialist: Windows Server 2008 Network Infrastructure, Configuration
Microsoft
Certified Technology Specialist: Windows Server 2008 Applications Infrastructure, Configuration
Microsoft
Certified Technology Specialist: Windows 7, Configuring
Microsoft
Certified Technology Specialist: Designing and Providing Volume Licensing Solutions to Large Organizations
Microsoft Certified IT Professional: Enterprise Administrator
Microsoft Certified IT Professional: Server Administrator
Microsoft Certified Trainer
Free Windows Admin Tool Kit Click here and download it now
July 18th, 2012 3:09pm
If none of the above suggestions help, please open a ticket with Microsoft. You can contact Microsoft Customer Support Service (CSS) for assistance so that this problem can be resolved efficiently.
http://support.microsoft.com/contactus/
July 19th, 2012 2:01am
If none of the above suggestions help, please open a ticket with Microsoft. You can contact Microsoft Customer Support Service (CSS) for assistance so that this problem can be resolved efficiently.
http://support.microsoft.com/contactus/
Free Windows Admin Tool Kit Click here and download it now
July 19th, 2012 2:08am
Here's an interesting observation. In Performance Monitor, my active DFS node has a "Cache Bytes" value of 1,201,000,000 (1.2GB) and rising. Could this be a factor?
On the passive node (no sessions), I saw that the "Cache Bytes" value was at 750,000,000 (750MB) and rising. When I disabled DFS replication, the count stopped, but did not decrease. I rebooted the passive node, and now the value is stable at 100,000,000
(100MB).
According to
this document, I should never go over 300MB. I also have a dump log that I can share - it seems to indicate a driver/HDD failure.
-----------------------
Microsoft (R) Windows Debugger Version 6.2.8400.0 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [c:\users\rs01775\desktop\SVRFS02.DMP]
Kernel Summary Dump File: Only kernel address space is available
Symbol search path is: srv*c:\symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows 7 Kernel Version 7601 (Service Pack 1) UP Free x64
Product: Server, suite: TerminalServer SingleUserTS
Built by: 7601.17835.amd64fre.win7sp1_gdr.120503-2030
Machine Name:
Kernel base = 0xfffff800`01608000 PsLoadedModuleList = 0xfffff800`0184c670
Debug session time: Thu Jul 19 08:52:33.702 2012 (UTC - 4:00)
System Uptime: 1 days 23:01:33.728
Loading Kernel Symbols
...............................................................
...............................................Page 10020b not present in the dump file. Type ".hh dbgerr004" for details
.................
.............
Loading User Symbols
PEB is paged out (Peb.Ldr = 000007ff`fffda018). Type ".hh dbgerr001" for details
Loading unloaded module list
............
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
Use !analyze -v to get detailed debugging information.
BugCheck F4, {3, fffffa8004999b30, fffffa8004999e10, fffff80001986510}
Page 10020b not present in the dump file. Type ".hh dbgerr004" for details
Probably caused by : csrss.exe
Followup: MachineOwner
---------
kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
CRITICAL_OBJECT_TERMINATION (f4)
A process or thread crucial to system operation has unexpectedly exited or been
terminated.
Several processes and threads are necessary for the operation of the
system; when they are terminated (for any reason), the system can no
longer function.
Arguments:
Arg1: 0000000000000003, Process
Arg2: fffffa8004999b30, Terminating object
Arg3: fffffa8004999e10, Process image file name
Arg4: fffff80001986510, Explanatory message (ascii)
Debugging Details:
------------------
Page 10020b not present in the dump file. Type ".hh dbgerr004" for details
PROCESS_OBJECT: fffffa8004999b30
IMAGE_NAME: csrss.exe
DEBUG_FLR_IMAGE_TIMESTAMP: 0
MODULE_NAME: csrss
FAULTING_MODULE: 0000000000000000
PROCESS_NAME: csrss.exe
EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.
BUGCHECK_STR: 0xF4_c0000005
DEFAULT_BUCKET_ID: WIN7_DRIVER_FAULT
CURRENT_IRQL: 0
STACK_TEXT:
fffff880`0294eb08 fffff800`01a0e892 : 00000000`000000f4 00000000`00000003 fffffa80`04999b30 fffffa80`04999e10 : nt!KeBugCheckEx
fffff880`0294eb10 fffff800`019bae8b : ffffffff`ffffffff fffffa80`0499ab50 fffffa80`04999b30 fffffa80`04999b30 : nt!PspCatchCriticalBreak+0x92
fffff880`0294eb50 fffff800`01939f74 : ffffffff`ffffffff 00000000`00000001 fffffa80`04999b30 00000000`00000008 : nt! ?? ::NNGAKEGL::`string'+0x176d6
fffff880`0294eba0 fffff800`01686453 : fffffa80`04999b30 fffff880`c0000005 fffffa80`0499ab50 fffffa80`28820060 : nt!NtTerminateProcess+0xf4
fffff880`0294ec20 00000000`76e215da : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13
00000000`00ccdb48 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x76e215da
STACK_COMMAND: kb
FOLLOWUP_NAME: MachineOwner
FAILURE_BUCKET_ID: X64_0xF4_c0000005_IMAGE_csrss.exe
BUCKET_ID: X64_0xF4_c0000005_IMAGE_csrss.exe
Followup: MachineOwner
---------
kd> !process fffffa8004999b30 3
PROCESS fffffa8004999b30
SessionId: 1 Cid: 0174 Peb: 7fffffda000 ParentCid: 016c
DirBase: 12e745000 ObjectTable: fffff8a001866170 HandleCount: 96.
Image: csrss.exe
VadRoot fffffa800428ab60 Vads 66 Clone 0 Private 274. Modified 282. Locked 0.
DeviceMap fffff8a0000088c0
Token fffff8a001801850
ElapsedTime 1 Day 23:01:22.042
UserTime 00:00:00.000
KernelTime 00:00:00.046
QuotaPoolUsage[PagedPool] 101944
QuotaPoolUsage[NonPagedPool] 9208
Working Set Sizes (now,min,max) (697, 50, 345) (2788KB, 200KB, 1380KB)
PeakWorkingSetSize 900
VirtualSize 46 Mb
PeakVirtualSize 47 Mb
PageFaultCount 1720
MemoryPriority BACKGROUND
BasePriority 13
CommitCharge 409
THREAD fffffa8004959b50 Cid 0174.0188 Teb: 000007fffffdc000 Win32Thread: fffff900c01c6360 WAIT: (WrLpcReply) UserMode Non-Alertable
fffffa8004959f18 Semaphore Limit 0x1
THREAD fffffa800499cb50 Cid 0174.018c Teb: 000007fffffd8000 Win32Thread: fffff900c01cac20 WAIT: (UserRequest) UserMode Alertable
fffffa8004941a10 SynchronizationEvent
fffffa8004955ab0 SynchronizationEvent
fffffa8004951920 SynchronizationEvent
fffffa800499d900 SynchronizationEvent
THREAD fffffa800499eb50 Cid 0174.0190 Teb: 000007fffffd6000 Win32Thread: fffff900c00ab010 WAIT: (WrLpcReceive) UserMode Non-Alertable
fffffa800499ef18 Semaphore Limit 0x1
THREAD fffffa800499fb50 Cid 0174.0194 Teb: 000007fffffd4000 Win32Thread: 0000000000000000 WAIT: (WrLpcReceive) UserMode Non-Alertable
fffffa800499ff18 Semaphore Limit 0x1
THREAD fffffa800499ab50 Cid 0174.01a0 Teb: 000007fffffde000 Win32Thread: fffff900c01f2c20 RUNNING on processor 0
THREAD fffffa80049bfb50 Cid 0174.01b8 Teb: 000007fffffae000 Win32Thread: fffff900c01855b0 WAIT: (WrUserRequest) KernelMode Alertable
fffffa80041d5e90 SynchronizationEvent
fffffa800496bdc0 NotificationTimer
fffffa80041e0ef0 SynchronizationTimer
fffffa8004961d80 SynchronizationEvent
THREAD fffffa80049c8b50 Cid 0174.01c0 Teb: 000007fffffac000 Win32Thread: fffff900c015f010 WAIT: (WrUserRequest) UserMode Non-Alertable
fffffa80049bfad0 SynchronizationEvent
fffffa8004976e80 SynchronizationEvent
THREAD fffffa802c8cf060 Cid 0174.1e04 Teb: 000007fffffaa000 Win32Thread: 0000000000000000 WAIT: (WrLpcReceive) UserMode Non-Alertable
fffffa802c8cf428 Semaphore Limit 0x1
kd> lmvm csrss
start end module name
July 20th, 2012 2:44pm
We currently run a two-node DFS cluster. The currently accessed server (the only one listed in the namespace) will crash consistently every 72 hours or so. If we switch out the node in the namespace with the other node, the now-active node will crash with
the same error.
The stop error is consistent on both nodes: 0x000000F4 (0x0000000000000003, Random, Random, 0xFFFFF800019D2510). No cause file is listed.
Both servers are virtual in a VMWare environment. ESXi 5.0u1 with latest VMWare tools.
I had thought that maybe KB2521220
might fix it, but the stop error is different (though the behavior is the same).
Any ideas? Thanks in advance!
For Bug Check Code 0x000000F4 CRITICAL_OBJECT_TERMINATION, please refer:http://msdn.microsoft.com/en-us/library/windows/hardware/ff560372(v=vs.85).Regards, Ravikumar P
Free Windows Admin Tool Kit Click here and download it now
July 20th, 2012 8:30pm
IT is not effective for us to debug the crash dump file here in the forum, so it is recommend that you contact Microsoft
Customer Service and Support (CSS) via telephone so that a dedicated Support Professional can assist with your request. Thanks for your understanding.
To obtain the phone numbers for specific technology request please take a look at the web site listed below:
http://support.microsoft.com/default.aspx?scid=fh;EN-US;OfferProPhone#faq607
Hope the issue will be resolved soon.http://www.arabitpro.com
July 21st, 2012 4:33am