DAG failover to DR/Passive node breaks RPC

Hi there, I have a very strange problem with Exchange 2013 SP1 RU7,

I have a 3 node DAG, all nodes are on the same subnet and same AD site

If I shutdown servers or move the databases between nodes outlook RPC reconnects after a minute or so after remounting the databases so working as expected.

now I apply this setting to disable automatic failover to node3 as node 3 is for DR

set-mailboxserver -identity node3 -databasecopyautoactivationpolicy Blocked

If I then turn off node 1 & 2 leaving node 3 and reboot the FSW to simulate failure the DAG loses then retains Quorum but the databases do not automatically mount on node 3 as expected

If I then run move-activemailboxdatabase DB1 -activateonserver Node3 the database mounts fine

however outlook will not connect, if I run the exchange connectivity analyser it seems I have web services problems and cannot get an autodiscover response back, If I turn node 1 & node 2 back on and move database back I still have web services issues and no outlook clients can connect.

After around half an hour of tropubleshooting the web services start working again !

can anyone think why this is happening or what to check /

February 27th, 2015 4:05am

One more bit of info...all web services URL's go through an F5 load balancer
Free Windows Admin Tool Kit Click here and download it now
February 27th, 2015 4:25am

Hi ,

First of all in your scenario , there is no need to have the FSW server because you are having the odd no of nodes so your cluster quorum model would be node majority.

Same time if you have exchange installed on windows server 2012 then by default you would be having the option dynamic quorum enabled on the cluster.

Dynamic quorum only supports sequential failure and not the simultaneous failure.

Types of failure :

1.sequential failure on dag nodes - first node in dag will go down and after a short time second node will go down . 

2.simultaneous failure on dag nodes - majority of the nodes will go down at a time .On your case majority is two.

Scenario 1: 

just consider node 1 and node 2 was went down simultaneously.On such case you entire cluster will go down .Because cluster doesn't have the enough time to recalculate the quorum.

Scenario 2 :

Just consider node 1 was went down due to hardware failure on today.After a short period if we shutdown node 2 then On that time cluster will have the enough time to recalculate the quorum.

Moreover the command which you have executed on your end is perfect but the concept is wrong.

Reference Link for dynamic quorum:

http://www.msexchange.org/articles-tutorials/exchange-server-2013/high-availability-recovery/exchange-2013-dag-dynamic-quorum-part1.html

February 27th, 2015 4:30am

I thought I needed a FSW to support failure down to a single node,

I have checked in cluster manager when node 1 & node 2 are down and the DAG is up and the database mounts when I run the move command so why do I get issues connecting to outlook over rpc if the DB is mounted?

Free Windows Admin Tool Kit Click here and download it now
February 27th, 2015 4:34am

Hi ,

Did you have any error's ? If so please share me that .

Same time please tell me are you facing issues for internal outlook clients or else for external outlook clients ?

February 27th, 2015 5:39am

happens for internal and external, both use the same load balancer for CAS

Example error:the Microsoft connectivity analyser failed to obtain an autodiscover XML response an HTTP 500 response was returned from unknown

at this point the mailbox is currently mounted on node 3 and all 3 CAS servers are up.

If I try to browse to autodiscover.xml I get a runtime error web.config configuration file , try tests show web services are down.

If I move DB to node 2 or node 1 I get the same same issue although after 15-20 minutes service is suddenly restored...

Free Windows Admin Tool Kit Click here and download it now
February 27th, 2015 5:52am

Hi ,

Step 1 : Please make sure all the internal autodiscover uri is set in all the cas servers.Same time the name mentioned on the url's need to be available on the SAN certificate installed in exchange and also make sure if we resolve the autodiscover name then it should go to the ip address of the load balancer.

Step 2 : Make sure the exchange SAN certificate is installed on F5 as well as on all the cas servers.

Step 3 : Please verify the internal and external outlook anywhere name in all the cas servers.

Step 4 : We need a Host A record in the external DNS for the external outlook clients to configure their profiles automatically.

Step 5: Moreover if you try to access the autodiscover url on the internet browser it should give an 600 in valid error message for both internal and external users .So that we can confirm the autodiscover is working fine.

February 27th, 2015 6:30am

Should I get an autodiscover response if MBX servers are down?
Free Windows Admin Tool Kit Click here and download it now
February 27th, 2015 7:36am

Hi ,

In your case , one of your mailbox server and all the cas servers is up ad running then it wouldn't be an problem for the autodiscover to work if we had the proper configurations done for autodiscover feature.

So no need to worry about the mailbox servers which is in down state.

Note : But here is the condition is that we atleast need one mailbox server and one cas server for the exchange features to work properly.

February 27th, 2015 7:55am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics