http virtual server fails in cluster configuration (exchange 2003)
I have an Exchange 2003 SP2 in cluster configuration on two nodes (windows 2003 Ent. sp2). Permanently there is a problem with http virtual server service on both nodes. It becomes unavaible without any reason. It happensin a different time on both nodes. The error message in logs is: "Cluster resource HTTP Virtual Server in Cluster group Exchange fails". In Diagnostic logging most options are set to Medium. When I restore IIS metabase on both nodes, HTTP virtual server becomes OK for a week or two. Help please.
January 9th, 2008 12:08pm

We are experiencing the same issue, however it seems to be appearing in only one node. However, I found an interesting article, but have not tried it yet. Comments Mon 10.15.07: Exchange HTTP Virtual Server: The IsAlive check for this resource failed.After upgrading our sever to SP2 on our Exchange 2003 Cluster, we started receiving this error message.Event Type: ErrorEvent Source: MSExchangeClusterEvent Category: Services Event ID: 1005Date: 10/15/2007Time: 9:53:16 AMUser: N/AComputer: EXCHANGE01Description:Exchange HTTP Virtual Server: The IsAlive check for this resource failed. For more information, click http://www.microsoft.com/contentredirect.asp.Data:0000: 46 27 00 00 F'.. Here's what the problem was.We are running Windows Server 2003 on a 32 bit server. 32-bit versions of Windows support a maximum of 4 GB RAM. By default, Windows slices the total memory right down the middle: 2 GB is reserved for the OS and 2 GB for the applications. Out of the 2 GB reserved for the OS, 256 MB is reserved for non-paged pool memory.We are using the /3GB switch, which forces Windows to limit itself to 1 GB RAM and let the applications use 3 GB. But this causes the non-paged pool memory reservation to be reduced to 128MB instead of 256MB.IIS uses non paged pool memory for processing requests. On Windows Server 2003 and Windows Vista, IIS stops processing requests once the available non-paged pool memory goes below 20 MB. Event ID 2019, 1005 and 1069 are evidence of that happening.Since Exchange relies heavily on IIS that explains why the Exchange HTTP Virtual Server resource (OWA) is going down. But what was eating up the non-paged pool memory? The culprit was Broadcom's NetXtreme II network card driver. It was incompatible with scalable networking features bundled with Windows Server 2003 SP2 (and the Windows Scalable Networking Pack) and causes a memory leak. The memory leak depletes the available non-paged memory pool with each transaction. That is why we did not see any impact until the usage of OWA (Webmail, ActiveSync, Entourage clients) increased. Then we saw a 3 4 hour window between failures earlier and a much shorter window this morning as usage increased.The initial solution is to disable the TCP Chimney. This is done with the following with following command:Netsh int ip set chimney DISABLEDWe also disabled the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\EnableTCPA registry value setting by it to zero on both nodes.Note:I worked with microsoft premier support to get this resolution. I did see it posted on several other sites, but they instructed me to follow these steps.Note:To see if this is the problem you are having, use perfmon and monitor the Pool NonPaged Bytes. If it goes above 108 and you are using the /3gb switch, then IIS will shut itself down to save resources. We were above that when the problem was happening, and after running the chimney disabled it dropped to 79. IIS uses a 20% fudge factor, or in other words if only 20% of the pool is free, it will shut itself down so that other OS resources don't get hurt by IIS. Item Rating Information.... Total Votes: 4 - Rating: 5.00 Please rate this item:012345 Comments made admin wrote: Here is what the Microsoft Support rep sent when I asked for a explaination I could take to the CTO.Here is a summary of the actions we took today to address the failure of the HTTP Resource on your Exchange 2003 clustered mailbox server.The first indication of a problem with the HTTP Resource is indicated by the Cluster Logs. In this case the HTTP Resource began failing IsAlive checks with error 10054. That error indicates the HTTP Resource is no longer processing requests. At this point OWA users would not be able to connect to their mailboxes.This problem is generally caused by an exhaustion of non-paged pool memory. On a Windows 2003 server using the /3 GB switch in the boot.ini file, there is only 128 MB of non-paged pool memory that is shared by components such as device drivers and some applications. Non-paged pool memory as the name implies is never swapped with the page file - the memory should always be available for the components that share it.When the total amount of non-paged pool memory in use passes the 108 MB mark, the HTTP Resource while continue to remain online, but refuses new connections. When all the memory has been used, the resource fails to come online at all.Upon investigating this issue, we used the Performance Monitor tool to check the current value for Memory counter "Pool nonpaged bytes". The value had maxed out at 128 MB which meant no free pool memory was available for the HTTP Resource.Recently there has been an increase of this issue as more customers apply Windows 2003 SP2 to their Exchange servers. included in SP2 is the Microsoft Windows Server 2003 Scalable Networking Pack which contains stateful and stateless offloads to accelerate the Windows network stack. The pack includes the feature TCP Chimney offload which has been linked to driver issues for Broadcom network interface cards. In effect with Chimney offloading enabled, the NIC driver does not release non-paged pool memory correctly and ends up consuming most if not all of the available memory as part of it's caching.We suspected this was the case in this issue since a.) Service pack 2 had been recently applied and b.) the chimney offload feature was enabled on the Broadcom NIC driver.By running the command "Netsh int ip set chimney DISABLED" from the command line we turned off the feature. This immediately caused the release of most of the excess non-paged pool memory as indicated by the Performance Monitor tool, where the current value dropped from 128 MB to 76 MB. After that we were able to bring the HTTP Resource online.I asked that you continue to monitor the value for Pool nonpaged memory as this is an indicator for the ability of the HTTP Resource to function as expected. There are some additional "tweaks" that can be implemented to tune an Exchange server for optimum non-paged pool memory usage. These tweaks would require planned downtime for registry changes and cluster failover. If you would like to implement them please let me know so I can supply the documentation required. As for now the cluster should continue to operate as expected with the change we implemented.In addition, you asked about the relevance of this issue to the symptoms described in article 936594. This article addresses communication issues that may occur after Windows SP2. In your case there may be communication issues that need investigation. However, the immediate issue addressed today involved the misbehaving NIC driver. The steps described in article 936594 would not have addressed this issue since an updated driver is not currently available.
Free Windows Admin Tool Kit Click here and download it now
January 10th, 2008 1:20pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics