Exchange 2003 - cluster node regularly runs out of NPP memory and fails over...
Hello, We have an Exchange 2003 cluster made up of 4 active nodes and two passive nodes. We have about 1,000 mailboxes per (active) node split over several databases. On one of the nodes, OWA/HTTP regularly (say twice a month) runs out of non-paged pool memory and fails over to another node. The average is about 90mb and it peaks at about 120mb before failing over. Since the spread of mailboxes over all nodes is fairly equal, the usage should be similar and we don't understand why that node only is affected. The server has the /3GB and /USERVA=3030 switches. Can anyone please suggest what to look for? Thanks, - Alan.
May 11th, 2011 4:56pm

We had a server that once was running out of non paged memory. It turned out to be the Symantec product. We ran Poolmon.exe to see which process were consuming the memory. http://support.microsoft.com/kb/177415
Free Windows Admin Tool Kit Click here and download it now
May 11th, 2011 9:57pm

Additionaly please reffer to http://blogs.technet.com/b/dblanch/archive/2009/04/18/paged-and-non-paged-pool-issues-on-exchange-2000-2003.aspx Dhruv Dhruv
May 11th, 2011 11:15pm

How is thing going on? Can you locate the problematic application via the poolmon.exe utility? If there is any progress or question, please feel free to post it here to discuss. Regards, Novak Wu TechNet Subscriber Support in forum If you have any feedback on our support, please contact tngfb@microsoft.com Please remember to click Mark as Answer on the post that helps you, and to click Unmark as Answer if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
Free Windows Admin Tool Kit Click here and download it now
May 13th, 2011 10:26am

Well, the thing is I already know the problem application is the owa (http) component of Exchange. So poolmon won't help.
May 13th, 2011 10:28am

Windows IIS will not allow anymore web connections ( OWA ) if the non paged pool mem runs below a certain threshold. This is exactly what happened to us. How I found it was the error logs and the refused connections under c:\windows\system32\logfiles\httperr Try to run poolmon and post a screen shot. http://blogs.msdn.com/b/david.wang/archive/2005/09/21/howto-diagnose-iis6-failing-to-accept-connections-due-to-connections-refused.aspx http://technet.microsoft.com/en-us/library/aa996269(EXCHG.80).aspx
Free Windows Admin Tool Kit Click here and download it now
May 13th, 2011 4:07pm

We had a similar problem which turned out to be the SAN HBA drivers. Neill
May 13th, 2011 4:14pm

Thanks but we don't get any refused connections. What happens is that it failsover to another node in the cluster. And that node happily starts to accept connections.
Free Windows Admin Tool Kit Click here and download it now
May 13th, 2011 4:14pm

I can be a little stubborn from time to time but I do think you need to look at poolmon. Have your run the Exchange performance alanlyzer? http://blogs.technet.com/b/exchange/archive/2005/12/07/415733.aspx What happens when kernel memory resources are exhausted? Symptoms of kernel memory exhaustion include: Slow performance Server crashes or cluster failovers Errors that report complete exhaustion of system page table entries (PTEs) or kernel pool memory
May 13th, 2011 4:34pm

Paul is correct. OWA may be using a lot more memory than it should but that might be 'real' ram rather than non-paged pool memory. And the problem is that drivers (such as the HBA mentioned above) won't show up in task manager as using any resources, that's why you have to dig deeper. By the way what does the Best Practice Analyzer say about your system? Neill
Free Windows Admin Tool Kit Click here and download it now
May 14th, 2011 6:17pm

Running out of pool memory is often caused by drivers. Each driver consume kernel memory and when the memory is over a certain limit, strange things happen. Whach pool memory carefully and uninstall unesasary drivers. Every KB count if you're running on the edge. lasse at humandata dot se, http://anewmessagehasarrived.blogspot.com
May 15th, 2011 3:23pm

Has this been resolved? I am curious. Paul
Free Windows Admin Tool Kit Click here and download it now
May 23rd, 2011 9:36pm

Device drivers, filter drivers, excessive working set trimming; all of these can cause issues with NPP. Have you considered updating to a version of Exchange that runs on a 64 bit OS?
May 24th, 2011 12:42am

Running Poolmon on a production system isn't straightforward what with the registry edits. So we'll live with the occasional failovers and wait for our migration project. We have dual network cards for both the normal and heartbeat networks, as well as SAN drivers so all these will gobble NPP memory. Thanks anyway.
Free Windows Admin Tool Kit Click here and download it now
May 24th, 2011 12:03pm

Enabling Tag Mode Before running PoolMon, you must enable pool tagging and then restart your computer. The pool tagging feature collects and calculates statistics about pool memory sorted by the tag value of the memory allocation. Note It is not necessary to enable pool tagging in Windows Server 2003 as it is enabled by default. You do not need to modify the registry for windows 2003. You should be able to run all you have to do is run the tool from the folder you extra the file to. I really doesn't do anything. I just tried it again on one of my 2003 servers. Paul P - Sorts tag list by Paged, Non-Paged, or mixed. Note that P cycles through each one. B - Sorts tags by max byte usage. M - Sorts tags by max byte allocation. T - Sort tags alphabetically by tag name. E - Display Paged, Non-paged total across bottom. Cycles through. A - Sorts tags by allocation size. F - Sorts tags by "frees". S - Sorts tags by the differences of allocs and frees. E - Display Paged, Non-paged total across bottom. Cycles through. Q - Quit.
May 24th, 2011 3:44pm

Any update on this ? lasse at humandata dot se, http://anewmessagehasarrived.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
July 2nd, 2011 12:56pm

Hi, We've been experiencing this with a number of our customers; while the problem is experienced with the http virt serv, that is really a symptom of a more general lack of npp memory - the virtserv will shut down when npp is close to exhaustion, but without a poolmon it's difficult to prove that the virtserv is responsible for that exhaustion - we've had one customer who wasn't using owa at all, but the virtserv shutting down was triggering a failover, as exres pinged it every 5 minutes or so. if you're in exchange 2003 with the 3gb switch running, then your npp memory is limited to 128MB... nice. a poolmon is needed here to see what is using that npp. you shouldn't need to do anything in the registry at all if you're on win2k3 - we've never had to: http://support.microsoft.com/kb/177415 without poolmon output everything else is guesswork, but a common problem i've seen is a huuuge mmcm usage - this is a contiguous memory block grabbed by drivers on startup. removing redundant network cards or storage devices can help reduce this. the ntdebugging blog has some good stuff on this: http://blogs.msdn.com/b/ntdebugging/archive/2009/10/27/mmcm-a-non-paged-pool-accounting-adventure.aspx depending on which antivirus product you use, that might be grabbing a bg bunch of it too; we've had issues where as the av drivers grow over time (each release is bigger than the last, it seems) eventually it squishes the amount of npp down to the point where the system is unstable. change your av. one thing you could try without doing a poolmon is to enable aggressive memory recycling, with the regstry key in this article: Start Registry Editor (Regedt32.exe). Locate and then click the following key in the registry: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\Memory Management On the Edit menu, click Add Value, and then add the following registry value: Value name: PoolUsageMaximum Data type: REG_DWORD Radix: Decimal Value data: 60 http://support.microsoft.com/kb/312362
July 4th, 2011 12:48pm

Thanks Ishmael, we have redundant network cards for both the public and private/cluster LAN so I think that's the first thing to try changing. Those drivers take a lot of npp memory. I'll try the poolmon and look at the aggressive memory recycling key asap. Much appreciated!
Free Windows Admin Tool Kit Click here and download it now
July 4th, 2011 1:04pm

It is far easier to run poolman than switch out network card drivers!! Run the poolmone like I said months again. All guess work until you run the tool. without it, you are wasting your time. Also, if you find the driver, program or what have you, you will most likely not need to modify your registry. "Hacking the reg " should be one of you last attempts.
July 5th, 2011 3:34am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics