Exchange 2003 Mailbox Server - 6.6 million X-Link2States per day
Hello all,
We are currently running a 6 server native Exchange 2003 organisation with 2 Active-Active MB Clusters and two front end servers. One routing group is configured with all six servers as members. MBCL01 is the routing group master, all other servers
are members.
We recently changed the routing topology by adding an additional connector to route mail out of a remote site, this meant that the Master server would now be handling more SMTP traffic, along with OWA01 (the FE server at the site in question). We've
noticed a reduction in performance, mainly around delays in sending/receiving mail, and outlook performance for mailboxes hosted on MBCL01. We identified that the inetinfo service is consuming ~20% processor resource and carrying out around 2billion
IO reads per day. Also the SMTP log file sizes have increased dramatically and are now around 2GB a day (whereas they used to be around 200MB).
After analysing the SMTP logs we found that the MB server was receiving 6.6 million link state updates (via the X-Link2State verb)
from itself every day. WinRoute shows that all 6 members are unable to connect to the master.
Other points worth noting - all servers are on the LAN (i.e. ports 691 and 25 are open on all boxes). I've got a feeling this problem has existed for a while, it's just manifested itself when we made the routing change, the security logs on the server
have always logged hundreds of successful authentication attempts per second from the system account.
I've checked most of the points in this article: http://support.microsoft.com/kb/832281 - everything looks ok.
Any help would be appreciated.
Thanks, Gareth.
May 10th, 2011 5:30am
On Tue, 10 May 2011 09:21:00 +0000, Gaz Jones wrote:
>We are currently running a 6 server native Exchange 2003 organisation with 2 Active-Active MB Clusters and two front end servers. One routing group is configured with all six servers as members. MBCL01 is the routing group master, all other servers are
members.
>
>We recently changed the routing topology by adding an additional connector to route mail out of a remote site, this meant that the Master server would now be handling more SMTP traffic, along with OWA01 (the FE server at the site in question). We've noticed
a reduction in performance, mainly around delays in sending/receiving mail, and outlook performance for mailboxes hosted on MBCL01. We identified that the inetinfo service is consuming ~20% processor resource and carrying out around 2billion IO reads per day.
Also the SMTP log file sizes have increased dramatically and are now around 2GB a day (whereas they used to be around 200MB).
>
>After analysing the SMTP logs we found that the MB server was receiving 6.6 million link state updates (via the X-Link2State verb) from itself every day. WinRoute shows that all 6 members are unable to connect to the master.
>
>Other points worth noting - all servers are on the LAN (i.e. ports 691 and 25 are open on all boxes). I've got a feeling this problem has existed for a while, it's just manifested itself when we made the routing change, the security logs on the server
have always logged hundreds of successful authentication attempts per second from the system account.
>
>I've checked most of the points in this article: http://support.microsoft.com/kb/832281 - everything looks ok.
The last time I had to deal with a problem like that had to be at
least seven or eight years ago!
If you can't get the members of the RG to connect to the master, try
moving the master to another machine. If that doesn't work, stop the
RESvc services on each server in the RG and then restart them.
Have you changed the FQDN on the SMTP Virtual Server? Is there a
corresponding A record for the name in your internal DNS? Is there a
SPN for the name?
If the problem's caused by a stale route (or multiple stale routes)
then the surest way to remove those routes is to shut down ALL the
Exchange servers in the organization. Then restart the FE servers,
then the BE servers. Because the link-state information is kept in
memory you can't just reboot the machines one at a time. If you do
that, and the member/master communications starts to work, you'll just
replicate the stale routes from another machine. ALL the machines have
to stopped before you restart any of them.
Just be thankful you have only six servers. At that time we had 120
machines and they were spread over very continent except Antarctica.
---
Rich Matheisen
MCSE+I, Exchange MVP
--- Rich Matheisen MCSE+I, Exchange MVP
Free Windows Admin Tool Kit Click here and download it now
May 10th, 2011 5:38pm
Excellent thanks for the reply Rich, I'll get some downtime scheduled in asap and let you know how I get on.
Re. the SPN - we have an A record for the address in DNS, I'll have to check the SPN's though. Do you know whether the SPN should be assigned to the computer account of the Exchange Virtual Server (i.e. the clustered name) or the computer account of
the cluster node?
I'd expect to see more kerberos issues/authentication failures in the logs if it was a missing SPN but it's worth checking all the same.
Thanks again, Gareth.
May 11th, 2011 3:35am
On Wed, 11 May 2011 07:20:34 +0000, Gaz Jones wrote:
>
>
>Excellent thanks for the reply Rich, I'll get some downtime scheduled in asap and let you know how I get on.
>
>Re. the SPN - we have an A record for the address in DNS, I'll have to check the SPN's though. Do you know whether the SPN should be assigned to the computer account of the Exchange Virtual Server (i.e. the clustered name) or the computer account of the
cluster node?
The SPNs are in a multi-valued property of the server. The names
should reflect whatever the SMTP VS uses to identify itself. Usually
you'll have two SPNs for each name:
SMTPSVC/<hostname>
SMTPSVC/<fqdn>
The setspn tool should tell you what SPNs are assigned to the machine
with "setspn -L <servername>". You can add SPNs with "setspn -A <spn>
<servername>".
>I'd expect to see more kerberos issues/authentication failures in the logs if it was a missing SPN but it's worth checking all the same.
You'd only have a problem with the machines that use Kerberos. E2K3
should offer NTLM and LOGIN in addition to GSSAPI, but Exchange 2010
wants to use GSSAPI and you need Kerberos for that.
---
Rich Matheisen
MCSE+I, Exchange MVP
--- Rich Matheisen MCSE+I, Exchange MVP
Free Windows Admin Tool Kit Click here and download it now
May 11th, 2011 10:25pm
Hi Rich - thanks for your help with this. Stopping the routing service on all servers and starting it back up one by one resolved the issue. We also found that the Active - Active cluster EVS's were both running on the same node which probably
didn't help.
Cheers, Gareth.
May 16th, 2011 4:22am