Latency leads to full system Failure in Exchange 2010 - Resolved in 2013?

My work encountered a bug - and for no lack of trying - I could not find any bug submission for Ex2010 anymore.  Since Most programming relies on the foundation of the past - if replication service was implemented the same in 2013 than it could leave to a full system failure.

The issue is something happened in REPLICATION service in such a fashion that it accepted a connection, and in the programming it set a variable to a LARGE NUMBER (fail closed) to replace for "Copy Queue length" in the ball park of 922 quadrillion.

The service never updates this value because of its quasi-state and some timeout occurs and now the GOOD servers think their copy is ludicrously behind.  This causes the GOOD servers to attempt to fail-over - as they believe their copy from the BAD server has failed or perhaps another check fails...

[6 DAG pieces on 3 servers, 4 pieces per server]

In my scenario Server 3 holds:  

1 part of server 2  <= Tells server 2 you're bad - don't fail over

1 part of server 1. <= Tells server 1 you're bad - don't fail over

2 parts its own <= "good" - though dead replication service

As I do not know if this exploitable by simply placing a listener in place of replication service to accept and do not respond to connections - I do not know.  However, there is a "bug" or flaw in my opinion to allow 1 bad server to superceed two good servers based on a partial service crash.  Simply stopping replication service on Server 3 allowed Server 1 / 2 to take over with out issue - I then restarted replication service and no issue since. 

Since i wasted an hour looking where to submit this - this forum will hopefully do - I am very much a white hat - but I am dislike the non-simple way before me to disclose said information to the creator.  I want my hour back for their non-intuitive process, or maybe the simply don't care about Exchange 2010.

- Dan


April 27th, 2015 6:29pm

Restart Microsoft Exchange Replication service, Microsoft Exchange Mailbox Assistants service, Exchange Information store service to have a try.

Is there any error logs?

Free Windows Admin Tool Kit Click here and download it now
May 5th, 2015 3:24am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics