Why do my DAG databases keep switching servers by themselves?

I have a two-node Exchange 2013 DAG spread across two AD sites. I'm trying to figure out why the databases keep switching between the two servers by themselves. For example, I can activate all databases on Node 1 on a Monday morning. By Tuesday PM a handful of them will have activated themselves on Node 2. I don't get any errors or client downtime. It's an otherwise perfect email system.

I've checked that Activation Preference Number is set correctly on each database and node so either the Active Manager is ignoring this setting or some other problem is causing the switch. The problem is that it causes errors with our backups as they get confused as to which database is active.

I should add that although both servers are in two separate AD sites they are physically in the same rack and connected into the same 1GbE router. They're just on two different subnets.

How do I keep them all activated on Node 1?

May 21st, 2015 4:15am

You need to take a look at the Windows logs and see if Managed Availability is moving them for some reason.  If it is, it should tell you why.
Free Windows Admin Tool Kit Click here and download it now
May 21st, 2015 9:29am

CollectOverMetrics.ps1 will help you figure out why databases moved when they did.
May 21st, 2015 1:40pm

CollectOverMetrics.ps1 will help you figure out why databases moved when they did.
Free Windows Admin Tool Kit Click here and download it now
May 21st, 2015 5:39pm

CollectOverMetrics.ps1 will help you figure out why databases moved when they did.
May 21st, 2015 5:39pm

Hi Andy,

Thank you for your question.

Why you active all databases on the same server?

I suggest: if the database belong to Node 1, we active the database on mailbox server of DAG member in Node 1; if the database belong to Node 2, we active the database on mailbox server of DAG member in Node 2. Then we could check if the issue persist.

How do I keep them all activated on Node 1?

A: if the much traffic go Node 1, it will has much delay when we submitted, if the Exchange performance is not very well, mailbox data was submitted over time, the database will keep switching.

We should check if there are any network issue in organization. For example network  flood.

we check there are any error when the DAG database keep switching, and send error to ibsexc@microsoft.com for out troubleshooting.

If there are any questions regarding this issue, please be free to let me know.

Best Regard,

Jim

Free Windows Admin Tool Kit Click here and download it now
May 22nd, 2015 3:43am

Hi,

try to trace the error log by application event viewer, there was many reasons for automatic  failover

*may be backup not working on the active server, so the log lun was full, thus the database failover to the next activation preference server.

*may be manage availability regarding to failure on service.

*others

you should to trace the error on application event viewer logs.

also see the replication status by run Test-replicationhealth

not recommended .....if you need to disable the automatic failover you can to block (autoactivationPolicy ) per mailbox servers or block DAC (database activation coordinator) per site or use (IF, else) script..... 

May 22nd, 2015 4:59am

Hi Jim,

All users are in the same site as EX01 whereas EX02 will be in a datacentre (eventually) where there are no users. I therefore thought it best to make all the databases active on EX01 - where all the users are. Please correct me if this is not best practice.

There is also a DPM server on each site and each one backs up its local Exchange node. DPM gets upset if any database activates itself on EX02, hence my original question.

I've run CollectOverMetrics.ps1 (thank you Jared) and I can see that there have been several automatic moves where the ActionReason has been 'FailureItem'. I'm not sure what causes that trigger. Where would I look to see the cause of that?

Free Windows Admin Tool Kit Click here and download it now
May 22nd, 2015 6:10am

Hi Jim,

All users are in the same site as EX01 whereas EX02 will be in a datacentre (eventually) where there are no users. I therefore thought it best to make all the databases active on EX01 - where all the users are. Please correct me if this is not best practice.

There is also a DPM server on each site and each one backs up its local Exchange node. DPM gets upset if any database activates itself on EX02, hence my original question.

I've run CollectOverMetrics.ps1 (thank you Jared) and I can see that there have been several automatic moves where the ActionReason has been 'FailureItem'. I'm not sure what causes that trigger. Where would I look to see the cause of that?


  • Edited by AndyChips 16 hours 28 minutes ago
May 22nd, 2015 10:09am

Hi Jim,

All users are in the same site as EX01 whereas EX02 will be in a datacentre (eventually) where there are no users. I therefore thought it best to make all the databases active on EX01 - where all the users are. Please correct me if this is not best practice.

There is also a DPM server on each site and each one backs up its local Exchange node. DPM gets upset if any database activates itself on EX02, hence my original question.

I've run CollectOverMetrics.ps1 (thank you Jared) and I can see that there have been several automatic moves where the ActionReason has been 'FailureItem'. I'm not sure what causes that trigger. Where would I look to see the cause of that?


  • Edited by AndyChips Wednesday, May 27, 2015 3:01 PM
Free Windows Admin Tool Kit Click here and download it now
May 22nd, 2015 10:09am

FailureItem tends to be hardware issues. If you look in Event Viewer at the time the failover started you should be able to see something. Especially under Applications and Services Logs -> Microsoft-Exchange-MailboxdatabaseFailureItems/Operational. But something might be in the Application log too.
May 22nd, 2015 4:29pm

Hi Andy,

I agree with Jared.

We could check if there are any errors in event, and send them to ibsexc@microsoft.com for our troubleshooting.

In addition, run the following command and post us result:

Get-MailboxDatabaseCopyStatus -Server  <servername> | Format-List

Test-ReplicationHealth -Identity <Servername>

If there are any questions regarding this issue, please be free to let me know.

Best Regard,

Jim

Free Windows Admin Tool Kit Click here and download it now
May 27th, 2015 4:29am

Well, that's just typical. None of the DBs have failed over recently and the event entries that related to the original automatic moves have now fallen off the logs.

Here's the results of the two cmdlets anyway

I've just checked the Microsoft-Exchange-MailboxDatabaseFailureItems/Operational log and can see a series of these events, and they correspond to the failover times:

MailboxDatabaseFailureItems

  • Edited by AndyChips 16 hours 29 minutes ago
May 27th, 2015 10:28am

Well, that's just typical. None of the DBs have failed over recently and the event entries that related to the original automatic moves have now fallen off the logs.

Here's the results of the two cmdlets anyway

I've just checked the Microsoft-Exchange-MailboxDatabaseFailureItems/Operational log and can see a series of these events, and they correspond to the failover times:

MailboxDatabaseFailureItems

  • Edited by AndyChips Wednesday, May 27, 2015 3:00 PM
Free Windows Admin Tool Kit Click here and download it now
May 27th, 2015 2:28pm

Hi ,

Did the error occur on the Ex02?

We could refer to the following link to repair database which is small-02 to check if the issue persist:

https://technet.microsoft.com/en-us/library/ff625226(v=exchg.150).aspx

If there are any questions regarding this issue, please be free to let me know.

Best Regard,

Jim

May 31st, 2015 4:47am

OK, I got it fixed. It turned out to be several failed Monitoring Mailboxes.

I used the PowerShell cmdlet:

Get-Mailbox -Monitoring

It listed a whole load of Monitoring Mailboxes that were in an errored state.

Once I'd deleted them all in AD and restarted the associated service they all got recreated and my databases have all been perfectly behaved since then.

It's a very simple procedure and here's one of several articles explaining how to do it. Scroll down to the last section entitled: How to re-create monitoring mailboxes (NOT considered regular maintenance!)

http://blogs.technet.com/b/exchange/archive/2015/03/20/exchange-2013-monitoring-mailboxes.aspx


  • Marked as answer by AndyChips 22 hours 59 minutes ago
  • Edited by AndyChips 22 hours 59 minutes ago
Free Windows Admin Tool Kit Click here and download it now
June 12th, 2015 4:29am

OK, I got it fixed. It turned out to be several failed Monitoring Mailboxes.

I used the PowerShell cmdlet:

Get-Mailbox -Monitoring

It listed a whole load of Monitoring Mailboxes that were in an errored state.

Once I'd deleted them all in AD and restarted the associated service they all got recreated and my databases have all been perfectly behaved since then.

It's a very simple procedure and here's one of several articles explaining how to do it. Scroll down to the last section entitled: How to re-create monitoring mailboxes (NOT considered regular maintenance!)

http://blogs.technet.com/b/exchange/archive/2015/03/20/exchange-2013-monitoring-mailboxes.aspx


  • Marked as answer by AndyChips Friday, June 12, 2015 8:28 AM
  • Edited by AndyChips Friday, June 12, 2015 8:29 AM
June 12th, 2015 8:28am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics