Exchange 2003 Mailbox Server - 6.6 million X-Link2States per day
Hello all, We are currently running a 6 server native Exchange 2003 organisation with 2 Active-Active MB Clusters and two front end servers. One routing group is configured with all six servers as members. MBCL01 is the routing group master, all other servers are members. We recently changed the routing topology by adding an additional connector to route mail out of a remote site, this meant that the Master server would now be handling more SMTP traffic, along with OWA01 (the FE server at the site in question). We've noticed a reduction in performance, mainly around delays in sending/receiving mail, and outlook performance for mailboxes hosted on MBCL01. We identified that the inetinfo service is consuming ~20% processor resource and carrying out around 2billion IO reads per day. Also the SMTP log file sizes have increased dramatically and are now around 2GB a day (whereas they used to be around 200MB). After analysing the SMTP logs we found that the MB server was receiving 6.6 million link state updates (via the X-Link2State verb) from itself every day. WinRoute shows that all 6 members are unable to connect to the master. Other points worth noting - all servers are on the LAN (i.e. ports 691 and 25 are open on all boxes). I've got a feeling this problem has existed for a while, it's just manifested itself when we made the routing change, the security logs on the server have always logged hundreds of successful authentication attempts per second from the system account. I've checked most of the points in this article: http://support.microsoft.com/kb/832281 - everything looks ok. Any help would be appreciated. Thanks, Gareth.
May 10th, 2011 5:30am

On Tue, 10 May 2011 09:21:00 +0000, Gaz Jones wrote: >We are currently running a 6 server native Exchange 2003 organisation with 2 Active-Active MB Clusters and two front end servers. One routing group is configured with all six servers as members. MBCL01 is the routing group master, all other servers are members. > >We recently changed the routing topology by adding an additional connector to route mail out of a remote site, this meant that the Master server would now be handling more SMTP traffic, along with OWA01 (the FE server at the site in question). We've noticed a reduction in performance, mainly around delays in sending/receiving mail, and outlook performance for mailboxes hosted on MBCL01. We identified that the inetinfo service is consuming ~20% processor resource and carrying out around 2billion IO reads per day. Also the SMTP log file sizes have increased dramatically and are now around 2GB a day (whereas they used to be around 200MB). > >After analysing the SMTP logs we found that the MB server was receiving 6.6 million link state updates (via the X-Link2State verb) from itself every day. WinRoute shows that all 6 members are unable to connect to the master. > >Other points worth noting - all servers are on the LAN (i.e. ports 691 and 25 are open on all boxes). I've got a feeling this problem has existed for a while, it's just manifested itself when we made the routing change, the security logs on the server have always logged hundreds of successful authentication attempts per second from the system account. > >I've checked most of the points in this article: http://support.microsoft.com/kb/832281 - everything looks ok. The last time I had to deal with a problem like that had to be at least seven or eight years ago! If you can't get the members of the RG to connect to the master, try moving the master to another machine. If that doesn't work, stop the RESvc services on each server in the RG and then restart them. Have you changed the FQDN on the SMTP Virtual Server? Is there a corresponding A record for the name in your internal DNS? Is there a SPN for the name? If the problem's caused by a stale route (or multiple stale routes) then the surest way to remove those routes is to shut down ALL the Exchange servers in the organization. Then restart the FE servers, then the BE servers. Because the link-state information is kept in memory you can't just reboot the machines one at a time. If you do that, and the member/master communications starts to work, you'll just replicate the stale routes from another machine. ALL the machines have to stopped before you restart any of them. Just be thankful you have only six servers. At that time we had 120 machines and they were spread over very continent except Antarctica. --- Rich Matheisen MCSE+I, Exchange MVP --- Rich Matheisen MCSE+I, Exchange MVP
Free Windows Admin Tool Kit Click here and download it now
May 10th, 2011 5:38pm

Excellent thanks for the reply Rich, I'll get some downtime scheduled in asap and let you know how I get on. Re. the SPN - we have an A record for the address in DNS, I'll have to check the SPN's though. Do you know whether the SPN should be assigned to the computer account of the Exchange Virtual Server (i.e. the clustered name) or the computer account of the cluster node? I'd expect to see more kerberos issues/authentication failures in the logs if it was a missing SPN but it's worth checking all the same. Thanks again, Gareth.
May 11th, 2011 3:35am

On Wed, 11 May 2011 07:20:34 +0000, Gaz Jones wrote: > > >Excellent thanks for the reply Rich, I'll get some downtime scheduled in asap and let you know how I get on. > >Re. the SPN - we have an A record for the address in DNS, I'll have to check the SPN's though. Do you know whether the SPN should be assigned to the computer account of the Exchange Virtual Server (i.e. the clustered name) or the computer account of the cluster node? The SPNs are in a multi-valued property of the server. The names should reflect whatever the SMTP VS uses to identify itself. Usually you'll have two SPNs for each name: SMTPSVC/<hostname> SMTPSVC/<fqdn> The setspn tool should tell you what SPNs are assigned to the machine with "setspn -L <servername>". You can add SPNs with "setspn -A <spn> <servername>". >I'd expect to see more kerberos issues/authentication failures in the logs if it was a missing SPN but it's worth checking all the same. You'd only have a problem with the machines that use Kerberos. E2K3 should offer NTLM and LOGIN in addition to GSSAPI, but Exchange 2010 wants to use GSSAPI and you need Kerberos for that. --- Rich Matheisen MCSE+I, Exchange MVP --- Rich Matheisen MCSE+I, Exchange MVP
Free Windows Admin Tool Kit Click here and download it now
May 11th, 2011 10:25pm

Hi Rich - thanks for your help with this. Stopping the routing service on all servers and starting it back up one by one resolved the issue. We also found that the Active - Active cluster EVS's were both running on the same node which probably didn't help. Cheers, Gareth.
May 16th, 2011 4:22am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics