Event ID 9646 - Can't work it out (Network Steve Forum)

Event ID 9646 - Can't work it out

Hi all, I have a weird problem happening withmy Exchange server. Sorry this is quite long but I will try and share everything I know as I have been troubleshooting it for a few days now. I have an exchange server 2003 - sp1 with all the latest hotfixes from wsus in my main office and 3 users in a regional office, these users are connected in through a persistentVPN tunnel between two Juniper SSG firewalls. They have their own domain controller on their LAN and the latency across the VPN tunnel averages 30ms with no packet loss. For the three users in the regional office I am getting Event ID 9646 on the exchange server: ------------ Mapi session "/o=DOMAIN/ou=First Administrative Group/cn=Recipients/cn=username" exceeded the maximum of 32 objects of type "session". ------------ There is nothing special about the regional office computers, they have Office 2003 with all the latest service packs and office updates, Symantec NAV 9 and not much else, they are configured to use cached exchange mode. When I look in the mailstore under logons I can see that they have reached the 32 session limit. The only way to clear these sessions it seems is to either restart the system attendant or close their sessions with tcpview. For the time being I have had to assign them view information store status permissions on the server as per http://support.microsoft.com/kb/842022 Originally I thought this was an MTU issue because I noticed there was some fragmented traffic across the VPN tunnel but I have since lowered it which eliminated the fragmented traffic but did not fix this issue. I can ping mailserver f l 1472 without defragmentation from workstations on the corporate LAN and the regional site. I have run outlook.exe /rpcdiag on their machines and noticed no failed RPC activity. I have run dcdiag on the domain controller in the remote site and everything checks out ok. There are no clues to suggest network issues in the eventlog on the regional domain controller. I can nmap the exchange server and see all the same ports as I can from the corporate LAN. I have run exmon and it looks like the users experiencing the problem have an unusually high number of cached mode sessions. Does any one know what might cause this? I cant find anyone with this problem on google except people blaming antivirus software that makes excessive MAPI connections but I cant see anything obvious. Im about to disable exchange cached mode on one of their computers to see if that solves the problem but long term I need it to work so they arent constantly connected to the mail server over the VPN tunnel. Any ideas would be greatly appreciated. Evan

January 22nd, 2007 3:49pm

My experience is upgrade exchange 2003 to SP2 can be resolve. If still here,I'll sugges you check antivirus config flow this kb article: http://support.microsoft.com/kb/245822/en-us

Free Windows Admin Tool Kit Click here and download it now

January 23rd, 2007 5:10am

Sorry for the confusion Exchange is SP2, Windows Server 2003 is SP1.Basically everything is managed by WSUS and is up to the latest patch revisions.I don't think this has anything to do with anti virus, I reinstalled it on all three machines and it is configured the same as the other 250 or so users on the network.

January 23rd, 2007 5:45am

More progress, I got one of the users to bring their laptop back to head office and they do not have this problem here on the LAN.So at a guess this seems to have something to do with Exchange Cached Mode sessions not closing properly via the VPN. Is there some kind of testing I can do for troubleshooting this?

Free Windows Admin Tool Kit Click here and download it now

January 23rd, 2007 7:03am

You can download and use Microsoft Exchange Server User Monitorto see user session in exchange server. Any session have timeout limit,so I don't thinkit's a session not clean close...

January 23rd, 2007 7:59am

This turned out to be the Juniper firewall having 60 second timeouts on exchange traffic. Juniper support said it is a known issue and had me change the timeouts. Still, was a fun problem to troubleshoot learned heaps about things I've never had to look at in exchange before :)

Free Windows Admin Tool Kit Click here and download it now

January 25th, 2007 3:48pm

We are having this EXACT issue and have Juniper's in the network. What timeout value did Juniper have you set it to?

January 30th, 2007 10:12pm

Here's what Juniper told me to do to fix it: -----------------1. Modify the service timeouts as below: set service MS-EXCHANGE-DATABASE timeout 200 set service MS-EXCHANGE-DIRECTORY timeout 200 set service MS-EXCHANGE-INFO-STORE timeout 200 set service MS-EXCHANGE-MTA timeout 200 set service MS-EXCHANGE-STORE timeout 200 set service MS-EXCHANGE-SYSATD timeout 200 set service MS-RPC-EPM timeout 200 2. Create a new trust to untrust policy at top and include service group ms-exchange and ms-rpc-epm as below (this assumes you do not already have a policy id 100): set policy id 100 top from trust to untrust any any ms-exchange permit set policy id 100 set service MS-RPC-EPM exit save-----------------I didn't need to do step two as I already had rules in place, I only did step one on all my firewalls. Had no problems since.

Free Windows Admin Tool Kit Click here and download it now

January 30th, 2007 11:24pm

Awesome!! Thanks SOOOO much for the follow-up....I didn't know if you were going to see it! You ROCK!

January 30th, 2007 11:32pm

The support guy from Juniper also said todayScreenOS 5.4.0r3 will be out soon and it addresses this issue.

Free Windows Admin Tool Kit Click here and download it now

February 1st, 2007 1:08am

ScreenOS 5.4.0r3 is out, installed it and set all the timeouts back to default. No more problem.

February 7th, 2007 6:24am

We are seeing this as well and Juniper support asked us to run the same commands. It doesn't make sense because only2 of the 42 offices that havemigrated from our NS204 to our new SSG520 cluster, are having major Exchange issues. It is just these 2 offices having problems and the other remaining 40 offices are just fine.All of our remote 5GT devices are all on the same ScreenOS. The 2 offices with problems are 5GT devices and running 5.3.0r7.0. The SSG520 devices are running 5.4.0r3a.0. Since none of these devices are running 5.4.0r2 as referenced in the KB article, does it still apply? Please advice, thanks.

Free Windows Admin Tool Kit Click here and download it now

April 2nd, 2007 6:58pm

We migrated 42 offices from our NS204 to our new SSG520 cluster, are having Exchange connectivity issues with only 2 offices. We are wondering why it is just with these 2 and the remaining 40 offices are just fine? All of our remote 5GT devices are all on the same ScreenOS - 5.3.0r7.0. The 2 offices with problems are 5GT devices and running 5.3.0r7.0. The SSG520 devices are running 5.4.0r3a.0. Netscreen support asked to follow this: http://kb.juniper.net/KB9230 1. Modify the service timeouts as below:set service MS-EXCHANGE-DATABASE timeout 60set service MS-EXCHANGE-DIRECTORY timeout 60set service MS-EXCHANGE-INFO-STORE timeout 60set service MS-EXCHANGE-MTA timeout 60set service MS-EXCHANGE-STORE timeout 60set service MS-EXCHANGE-SYSATD timeout 60set service MS-RPC-EPM timeout 60 2. Create a new trust to untrust policy at top and include service group ms-exchange andms-rpc-epm as below (this assumes you do not already have a policy id 100): set policy id 100 top from trust to untrust any any ms-exchange permit set policy id 100 set service MS-RPC-EPM exit save Since none of these devices are running 5.4.0r2 as referenced in the KB article, not sure if it still applies. Waiting for call back from Tier 2..... Any advice?

April 2nd, 2007 9:23pm

Thanks, that worked for us too!Upgraded remote site (5SSG) to latest revision of FW (Central has a Netscreen 25).Then deleted all remote connections to Exchange server from that site, and viola - everything works (for the moment).Argon0

Free Windows Admin Tool Kit Click here and download it now

June 11th, 2007 5:10pm

We have solved this problem by upgrading the IOS of Cisco Router now it is working fine, no errors

July 14th, 2007 1:41pm

Looks like I am find the root course of problem. MAPI sessions are based on *TCP* connectivity, and are alive while corresponding TCP sessions are opened. The problem is happen in case of dead TCP session, when client side drops old one and than has open new one, but the server side did not received acknowledgement for dropping old one and will continue handle both old TCP session and newly opened. "netstat -ano" will show all handled TCP sessions. So, the server is also handle both MAPI sessions, which may be closed by server when corresponding TCP session will closed. How long the Windows Server will handle dead TCP sessions? FOREVER! You need manually create registry value in order to turn on "keep alive" check of TCP sessions. This value for creation are "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveTime" and "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveInterval". See more http://technet2.microsoft.com/windowsserver/en/library/db56b4d4-a351-40d5-b6b1-998e9f6f41c91033.mspx?mfr=true

Free Windows Admin Tool Kit Click here and download it now

October 19th, 2007 12:26pm

Read http://msexchangeteam.com/archive/2007/07/18/446400.aspxfor an explanation of scalable networking pack in windows server 2003. The short of it is that server 2003 by default enables scalable networking and there are many networkcards which don't support it. This causes the number of sessions per user in exchange 2003 AND 2007 to continue to grow instead of dropping the stale ones. I found that my mobile laptop users were affected since every time they moved to a new location and connected with a new IP, they would generate 3 to 4 more sessions on the exchange server without dropping any of the old ones and therefore after a few days of this would get an error that the exchange server was unavailable and event id 9646 would be genereated on the server side. I followed the instructions to disable the TCP chimney and the number of sessions IMMEDIATELY reduced for all affected users without even having to reboot the mail servers. There are more instructions in this article of other steps that may also need to be taken, but just disabling the TCP chimney worked on all my mail servers, both exchange 2003 and exchange 2007.

October 29th, 2007 6:01pm

We have a similar issue. What Cisco router model and what version of IOS did you upgrade to?

Free Windows Admin Tool Kit Click here and download it now

February 6th, 2008 12:22am

The link from "technochic" was most useful, http://msexchangeteam.com/archive/2007/07/18/446400.aspxfor an explanation of scalable networking pack in windows server 2003. There appears to be a new Win update to turn off SNP by default http://support.microsoft.com/?kbid=948496that will hopefully resolve this problem.

March 13th, 2008 7:23pm

This topic is archived. No further replies will be accepted.