Event ID 9646 - Can't work it out
Hi all, I have a weird
problem happening withmy Exchange server. Sorry this is quite long but
I will try and share everything I know as I have been troubleshooting
it for a few days now. I have an exchange server 2003 - sp1 with all the latest hotfixes from wsus in my main office and 3
users in a regional office, these users are connected in through a
persistentVPN tunnel between two Juniper SSG firewalls. They have
their own domain controller on their LAN and the latency across the VPN
tunnel averages 30ms with no packet loss. For the three users in the regional office I am getting Event ID 9646 on the exchange server: ------------
Mapi session "/o=DOMAIN/ou=First Administrative
Group/cn=Recipients/cn=username" exceeded the maximum of 32 objects of
type "session". ------------ There is nothing special
about the regional office computers, they have Office 2003 with all the
latest service packs and office updates, Symantec NAV 9 and not much
else, they are configured to use cached exchange mode. When
I look in the mailstore under logons I can see that they have reached
the 32 session limit. The only way to clear these sessions it seems is
to either restart the system attendant or close their sessions with
tcpview. For the time being I have had to assign them view information
store status permissions on the server as per http://support.microsoft.com/kb/842022
Originally I thought this was an MTU issue because I noticed there was
some fragmented traffic across the VPN tunnel but I have since lowered
it which eliminated the fragmented traffic but did not fix this issue.
I can ping mailserver f l 1472 without defragmentation from
workstations on the corporate LAN and the regional site. I have run outlook.exe /rpcdiag on their machines and noticed no failed RPC activity.
I have run dcdiag on the domain controller in the remote site and
everything checks out ok. There are no clues to suggest network issues
in the eventlog on the regional domain controller. I can nmap the
exchange server and see all the same ports as I can from the corporate
LAN. I have run exmon and it looks like the users
experiencing the problem have an unusually high number of cached mode
sessions. Does any one know what might cause this? I cant find anyone
with this problem on google except people blaming antivirus software
that makes excessive MAPI connections but I cant see anything obvious.
Im about to disable exchange cached mode on one of their computers to
see if that solves the problem but long term I need it to work so they
arent constantly connected to the mail server over the VPN tunnel. Any ideas would be greatly appreciated. Evan
January 22nd, 2007 3:49pm
My experience is upgrade exchange 2003 to SP2 can be resolve.
If still here,I'll sugges you check antivirus config flow this kb article: http://support.microsoft.com/kb/245822/en-us
Free Windows Admin Tool Kit Click here and download it now
January 23rd, 2007 5:10am
Sorry for the confusion Exchange is SP2, Windows Server 2003 is SP1.Basically everything is managed by WSUS and is up to the latest patch revisions.I don't think this has anything to do with anti virus, I reinstalled it on all three machines and it is configured the same as the other 250 or so users on the network.
January 23rd, 2007 5:45am
More progress, I got one of the users to bring their laptop back to head office and they do not have this problem here on the LAN.So at a guess this seems to have something to do with Exchange Cached Mode sessions not closing properly via the VPN. Is there some kind of testing I can do for troubleshooting this?
Free Windows Admin Tool Kit Click here and download it now
January 23rd, 2007 7:03am
You can download and use Microsoft Exchange Server User Monitorto see user session in exchange server.
Any session have timeout limit,so I don't thinkit's a session not clean close...
January 23rd, 2007 7:59am
This turned out to be the Juniper firewall having 60 second timeouts on exchange traffic. Juniper support said it is a known issue and had me change the timeouts.
Still, was a fun problem to troubleshoot learned heaps about things I've never had to look at in exchange before :)
Free Windows Admin Tool Kit Click here and download it now
January 25th, 2007 3:48pm
We are having this EXACT issue and have Juniper's in the network. What timeout value did Juniper have you set it to?
January 30th, 2007 10:12pm
Here's what Juniper told me to do to fix it:
-----------------1. Modify the service timeouts as below:
set service MS-EXCHANGE-DATABASE timeout 200
set service MS-EXCHANGE-DIRECTORY timeout 200
set service MS-EXCHANGE-INFO-STORE timeout 200
set service MS-EXCHANGE-MTA timeout 200
set service MS-EXCHANGE-STORE timeout 200
set service MS-EXCHANGE-SYSATD timeout 200
set service MS-RPC-EPM timeout 200
2. Create a new trust to untrust policy at top and include service group ms-exchange and
ms-rpc-epm as below (this assumes you do not already have a policy id 100):
set policy id 100 top from trust to untrust any any ms-exchange permit
set policy id 100
set service MS-RPC-EPM
exit
save-----------------I didn't need to do step two as I already had rules in place, I only did step one on all my firewalls. Had no problems since.
Free Windows Admin Tool Kit Click here and download it now
January 30th, 2007 11:24pm
Awesome!! Thanks SOOOO much for the follow-up....I didn't know if you were going to see it! You ROCK!
January 30th, 2007 11:32pm
The support guy from Juniper also said todayScreenOS 5.4.0r3 will be out soon and it addresses this issue.
Free Windows Admin Tool Kit Click here and download it now
February 1st, 2007 1:08am
ScreenOS 5.4.0r3 is out, installed it and set all the timeouts back to default. No more problem.
February 7th, 2007 6:24am
We are seeing this as well and Juniper support asked us to run the same commands. It doesn't make sense because only2 of the 42 offices that havemigrated from our NS204 to our new SSG520 cluster, are having major Exchange issues. It is just these 2 offices having problems and the other remaining 40 offices are just fine.All of our remote 5GT devices are all on the same ScreenOS.
The 2 offices with problems are 5GT devices and running 5.3.0r7.0. The SSG520 devices are running 5.4.0r3a.0. Since none of these devices are running 5.4.0r2 as referenced in the KB article, does it still apply?
Please advice, thanks.
Free Windows Admin Tool Kit Click here and download it now
April 2nd, 2007 6:58pm
We migrated 42 offices from our NS204 to our new SSG520 cluster, are having Exchange connectivity issues with only 2 offices. We are wondering why it is just with these 2 and the remaining 40 offices are just fine? All of our remote 5GT devices are all on the same ScreenOS - 5.3.0r7.0.
The 2 offices with problems are 5GT devices and running 5.3.0r7.0. The SSG520 devices are running 5.4.0r3a.0.
Netscreen support asked to follow this: http://kb.juniper.net/KB9230
1. Modify the service timeouts as below:set service MS-EXCHANGE-DATABASE timeout 60set service MS-EXCHANGE-DIRECTORY timeout 60set service MS-EXCHANGE-INFO-STORE timeout 60set service MS-EXCHANGE-MTA timeout 60set service MS-EXCHANGE-STORE timeout 60set service MS-EXCHANGE-SYSATD timeout 60set service MS-RPC-EPM timeout 60
2. Create a new trust to untrust policy at top and include service group ms-exchange andms-rpc-epm as below (this assumes you do not already have a policy id 100): set policy id 100 top from trust to untrust any any ms-exchange permit set policy id 100 set service MS-RPC-EPM exit save
Since none of these devices are running 5.4.0r2 as referenced in the KB article, not sure if it still applies. Waiting for call back from Tier 2.....
Any advice?
April 2nd, 2007 9:23pm
Thanks, that worked for us too!Upgraded remote site (5SSG) to latest revision of FW (Central has a Netscreen 25).Then deleted all remote connections to Exchange server from that site, and viola - everything works (for the moment).Argon0
Free Windows Admin Tool Kit Click here and download it now
June 11th, 2007 5:10pm
We have solved this problem by upgrading the IOS of Cisco Router now it is working fine, no errors
July 14th, 2007 1:41pm
Looks like I am find the root course of problem. MAPI sessions are based on *TCP* connectivity, and are alive while corresponding TCP sessions are opened. The problem is happen in case of dead TCP session, when client side drops old one and than has open new one, but the server side did not received acknowledgement for dropping old one and will continue handle both old TCP session and newly opened. "netstat -ano" will show all handled TCP sessions. So, the server is also handle both MAPI sessions, which may be closed by server when corresponding TCP session will closed. How long the Windows Server will handle dead TCP sessions? FOREVER! You need manually create registry value in order to turn on "keep alive" check of TCP sessions. This value for creation are "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveTime" and "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveInterval".
See more
http://technet2.microsoft.com/windowsserver/en/library/db56b4d4-a351-40d5-b6b1-998e9f6f41c91033.mspx?mfr=true
Free Windows Admin Tool Kit Click here and download it now
October 19th, 2007 12:26pm
Read http://msexchangeteam.com/archive/2007/07/18/446400.aspxfor an explanation of scalable networking pack in windows server 2003. The short of it is that server 2003 by default enables scalable networking and there are many networkcards which don't support it. This causes the number of sessions per user in exchange 2003 AND 2007 to continue to grow instead of dropping the stale ones. I found that my mobile laptop users were affected since every time they moved to a new location and connected with a new IP, they would generate 3 to 4 more sessions on the exchange server without dropping any of the old ones and therefore after a few days of this would get an error that the exchange server was unavailable and event id 9646 would be genereated on the server side. I followed the instructions to disable the TCP chimney and the number of sessions IMMEDIATELY reduced for all affected users without even having to reboot the mail servers. There are more instructions in this article of other steps that may also need to be taken, but just disabling the TCP chimney worked on all my mail servers, both exchange 2003 and exchange 2007.
October 29th, 2007 6:01pm
We have a similar issue. What Cisco router model and what version of IOS did you upgrade to?
Free Windows Admin Tool Kit Click here and download it now
February 6th, 2008 12:22am
The link from "technochic" was most useful, http://msexchangeteam.com/archive/2007/07/18/446400.aspxfor an explanation of scalable networking pack in windows server 2003. There appears to be a new Win update to turn off SNP by default http://support.microsoft.com/?kbid=948496that will hopefully resolve this problem.
March 13th, 2008 7:23pm