CAS load balancing

 Hello,

I have a two Node Dag Exchange 2013 MultiRole server. The Usermailbox is on Node One and mounted and in Healthy status. If for some reason the Backend MSExchangeOWAapppool is stopped on Node ONE the user will not be able to access the mailbox, even though the Node Two Mulit role is up and functional. Healthchecks on the (netscaler) load balancer still direct owa requests to node ONE even though the pool has failed and the node is down. If I remove the load balancer it acts the same way. Is the component level failure not detected in Exchange 2013?

September 2nd, 2015 9:22am

 Hello,

I have a two Node Dag Exchange 2013 MultiRole server. The Usermailbox is on Node One and mounted and in Healthy status. If for some reason the Backend MSExchangeOWAapppool is stopped on Node ONE the user will not be able to access the mailbox, even though the Node Two Mulit role is up and functional. Healthchecks on the (netscaler) load balancer still direct owa requests to node ONE even though the pool has failed and the node is down. If I remove the load balancer it acts the same way. Is the component level failure not detected in Exchange 2013?

I would think this is something managed availability should catch and move the mailbox database to a different server.

When you access http://servername/owa/healthcheck.htm does that show everything up for that server? That's really what the netscaler is checking against.

How long did you let it run in that state?

Free Windows Admin Tool Kit Click here and download it now
September 2nd, 2015 12:56pm

I'm not sure if your external load balancer will detect this type of failure and stop routing users to the partially failed server. The best thing to do would be to find out why the OWA app pool is failing and troubleshoot that.

Other things that can be done are to run a script which will remove the server from the load balancer should there be issues with app pools/services etc but this may be quite a bit of work.

IIS logs and event logs should give you an idea why the app pool is failing.

Thanks.

September 2nd, 2015 1:22pm

When I stop the OWA app pool, the healthcheck shows HTTP Error 503. The service is unavailable. The problem is the db doesn't fail to the other node. If I down the server, or the nic (vm), the db fails over automatically and is ok. Should the db fail over if the app pool fails? If it should what should I look at to change the behaviour?

Free Windows Admin Tool Kit Click here and download it now
September 2nd, 2015 1:37pm

When I stop the OWA app pool, the healthcheck shows HTTP Error 503. The service is unavailable. The problem is the db doesn't fail to the other node. If I down the server, or the nic (vm), the db fails over automatically and is ok. Should the db fail over if the app pool fails? If it should what should I look at to change the behaviour?

How long do you keep the app pool stopped for?
September 2nd, 2015 1:48pm

Also, take a look at this and check the health report of the server.

http://blogs.technet.com/b/exchange/archive/2013/08/13/customizing-managed-availability.aspx

Free Windows Admin Tool Kit Click here and download it now
September 2nd, 2015 2:04pm

I wouldn't expect an app pool failure to cause the DAG to fail over. This is because the DAG is high availability for the mailbox server role and not the cas server role. 

The load balancer will need to be informed to stop sending requests to the CAS server should there be a partial failure of that CAS server. 

Thanks.

September 2nd, 2015 2:10pm

An CAS App pool failure may not cause a DAG failover but regardless you need to set your load balancer to check the health of the CAS and mark it down if its not responding so no clients are routed to it

http://blogs.technet.com/b/exchange/archive/2014/03/05/load-balancing-in-exchange-2013.aspx

To ensure that load balancers do not route traffic to a Client Access server that Managed Availability has marked as offline, load balancer health probes must be configured to check <virtualdirectory>/healthcheck.htm (e.g., https://mail.contoso.com/owa/healthcheck.htm). Note that healthcheck.htm does not actually exist within the virtual directories; it is generated in-memory based on the component state of the protocol in question.

If the load balancer health probe receives a 200 status response, then the protocol is up; if the load balancer receives a different status code, then Managed Availability has marked that protocol instance down on the Client Access server. As a result, the load balancer should also consider that end point down and remove the Client Access server from the applicable load balancing pool.

Free Windows Admin Tool Kit Click here and download it now
September 2nd, 2015 2:25pm

I am manually stopping the owa app pool on one of the dag servers for testing, mind you this is my pre-prod test lab. I have my netscaler setup per http://danielruiz.net/2015/05/26/exchange-2013-layer-7-single-namespace-loadbalancing-with-citrix-netscaler/comment-page-1/ and the probe does mark the node as down.

Forgetting the load balancer, internally I cannot bring up an owa session from another cas server directly to the node with the app pool failure. I'm sure this is per design, but I'm wondering why the active db doesn't fail over for this type of failure?


September 2nd, 2015 2:54pm

Also, if the cas app pool doesn't cause a db failover, why have the cas role on the same server as the mb role per best practices (simplified dag with autoreseed and a load balancer). If the mailbox in question doesn't move requests will keep going to the failed server.

Should I have 2 seperate cas servers load balanced?

Free Windows Admin Tool Kit Click here and download it now
September 2nd, 2015 3:01pm

The best practice is to keep both CAS and MBX on the same server but this is for simplicity I believe. 

The CAS and MBX roles can work independently. For example, if either CAS server has a problem, the other CAS should be able to provide access to the mailboxes no matter which MBX server has the database mounted. I.e. a CAS failure doesn't require an DAG failover and this is why app pools don't cause DAG failovers. 

September 2nd, 2015 4:24pm

Also, if the cas app pool doesn't cause a db failover, why have the cas role on the same server as the mb role per best practices (simplified dag with autoreseed and a load balancer). If the mailbox in question doesn't move requests will keep going to the failed server.

Should I have 2 seperate cas servers load balanced?


That would be true whether the CAS role was separate from the mailbox role or it was multi-role. The load balancer logic is what keeps clients from connecting to the CAS. Even if the database had failed over, clients could still be directed to the failing CAS. Multi-role is best.
Free Windows Admin Tool Kit Click here and download it now
September 2nd, 2015 5:58pm

Multi-role has indeed been the recommendation since Exchange 2010.

Design simplicity, and removes issues concerning MBX -> CAS ratio for example.

In Exchange 2016 this is now how you will deploy Exchange.  Roles are combined.

September 3rd, 2015 1:52pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics