Taking down DAG member for maintenance

Hello I need some advice on what to do.

I have a two-member Exchange 2010 DAG.  One is an active server where everything is mounted.  The other is simply the passive server.  I will need to power down the passive server in a few days for some extended maintenance.  Roughly several hours.  I know for a fact (because I've been through this before) replication will be cut off during this time and as a result I cannot run a full Exchange backup to flush all the transactions logs and secondly, the copy queues will be very backed up.  Last time they were so backed up all efforts to reseed the passive copy of the mailbox databases failed.  In the end I think I had to delete and copy the active server's databases over.  That took a really really long time.

So to avoid all that headache what should I do for this upcoming maintenance to make life easier?  

- Is there a way I can literally STOP replication or even delete the passive copies of the mailbox databases just so I can run full backups and truncate transaction logs?  

- Otherwise, assuming I have backed up copy queues, should I simply try reseeding or resuming synchronization?  Or should I not even waste time and delete all the passive copies and their files and start fresh with NEW copies?

April 28th, 2015 3:31pm

There's no reason that a huge backup of logs would cause databases to fail.  I've never seen that unless disk space is exhausted or something like that.  If you're having a problem you describe and it's reproducible, I'd open a support ticket with Microsoft because it's not normal.

I don't know why you can't take a backup of the active database copies, but it won't truncate the logs because of the logs still need to replicate to the passive copy.

You can certainly delete the passive copies but you'd just have to recreate them afterward.  That defeats the purpose of having a DAG.

Free Windows Admin Tool Kit Click here and download it now
April 28th, 2015 5:17pm

Hi Ed,

Actually what I mean to say was that the huge copy queue kept failing every time I tried to resume copying or synchronizing.  Not that the databases on the passive server would fail.  They'd mount fine.  It's just the log copying from the active to the passive server failed.  Sorry if I wasn't clear on it.

I can definitely take a backup of the active database but without the logs flushing, the directory holding the transaction logs would slowly fill up over time.  

I'm afraid I may have to delete the passive copy if the copy queue fails to decrease.  I know this goes against what a DAG is for.  

Is it possible then to copy the active database and then paste it over in the passive server?  Would I need to dismount the active database first before I can copy it to, let's say a locally attached USB drive?


  • Edited by khunkao 9 hours 49 minutes ago
April 28th, 2015 5:36pm

You might want to find out why the failure mode you're describing is happening because it shouldn't.  I haven't seen that failure mode.
Free Windows Admin Tool Kit Click here and download it now
April 28th, 2015 6:10pm

Hi Ed,

Actually what I mean to say was that the huge copy queue kept failing every time I tried to resume copying or synchronizing.  Not that the databases on the passive server would fail.  They'd mount fine.  It's just the log copying from the active to the passive server failed.  Sorry if I wasn't clear on it.

I can definitely take a backup of the active database but without the logs flushing, the directory holding the transaction logs would slowly fill up over time.  

I'm afraid I may have to delete the passive copy if the copy queue fails to decrease.  I know this goes against what a DAG is for.  

Is it possible then to copy the active database and then paste it over in the passive server?  Would I need to dismount the active database first before I can copy it to, let's say a locally attached USB drive?


  • Edited by khunkao Tuesday, April 28, 2015 9:37 PM
April 28th, 2015 9:35pm

Hi Ed,

Actually what I mean to say was that the huge copy queue kept failing every time I tried to resume copying or synchronizing.  Not that the databases on the passive server would fail.  They'd mount fine.  It's just the log copying from the active to the passive server failed.  Sorry if I wasn't clear on it.

I can definitely take a backup of the active database but without the logs flushing, the directory holding the transaction logs would slowly fill up over time.  

I'm afraid I may have to delete the passive copy if the copy queue fails to decrease.  I know this goes against what a DAG is for.  

Is it possible then to copy the active database and then paste it over in the passive server?  Would I need to dismount the active database first before I can copy it to, let's say a locally attached USB drive?


  • Edited by khunkao Tuesday, April 28, 2015 9:37 PM
Free Windows Admin Tool Kit Click here and download it now
April 28th, 2015 9:35pm

Hi Ed:

I never did find out why.  It's not like the copying failed immediately.  It's more like after running the command and waiting for some time, it stopped and went right back to "Failed and Suspended".  

April 29th, 2015 2:15pm

That may not have anything to do with the number of log files.  That usually means that there's a missing log file or some other problem.  You can fix that with a reseed afterward, and that's really no more work than assuming it'll fail and removing the copies ahead of time.
Free Windows Admin Tool Kit Click here and download it now
April 29th, 2015 5:01pm

What procedure are you using to place your server into maintenance? Are both servers in the same AD site? For short maintenance I use the Start-Databasemaintenance script, for longer outages or outages that affect iSCSI I do a datacentre switchover. With either of the options there is no attempt to replicate and once everything is back to normal queues are pretty high (we are over 3000+ active accounts) but never cause the DBs to go into a failed state. What happens to your quorum during maintenance? I've miscalculated before and accidentally broke quorum which wouldn't allow any of my dbs to mount. Here is a good explanation on what you need based on the number of servers you have in your deployment: https://blog.credera.com/technology-insights/microsoft-solutions/when-do-dags-need-a-file-share-witness/
April 29th, 2015 7:28pm

What procedure are you using to place your server into maintenance? Are both servers in the same AD site? For short maintenance I use the Start-Databasemaintenance script, for longer outages or outages that affect iSCSI I do a datacentre switchover. With either of the options there is no attempt to replicate and once everything is back to normal queues are pretty high (we are over 3000+ active accounts) but never cause the DBs to go into a failed state. What happens to your quorum during maintenance? I've miscalculated before and accidentally broke quorum which wouldn't allow any of my dbs to mount. Here is a good explanation on what you need based on the number of servers you have in your deployment: https://blog.credera.com/technology-insights/microsoft-solutions/when-do-dags-need-a-file-share-witness/
Free Windows Admin Tool Kit Click here and download it now
April 29th, 2015 11:26pm

Using that script is good advice.
April 30th, 2015 3:25am

Yes I actually I always run StartDagServerMaintenance scripts before closing the server down.  And then StopDagServermaintenance

The dag servers are split up one in each location.  The quorum stays online so the majority vote is preserved and active server is still running.  

The trouble again, like I said before, is everything mounts fine.  But what if for some reason I cannot decrease the copy queue and it keeps saying Failed and Suspended?  Is it then safe to just delete the passive copy and start over again?  I know it is risky because I'm left with just one copy and the seeding of a new database and logs could take a while but I'm just saying what if.  

Also as you know, you cannot flush transaction logs on the active server after a full backup if the replication isn't working and the queue is "Failed and Suspended".  Is it then possible to just delete the passive copy of the database before running a full backup?  Would that flush all transaction logs?


  • Edited by khunkao 18 hours 25 minutes ago
Free Windows Admin Tool Kit Click here and download it now
April 30th, 2015 8:58am

What I had to do when I had issues with backups and logs not flushing was to turn on circular logging (requires IS restart) on the affected DBs. I not 100% on this but what you can attempt to do is:

1. Run the Start-Databasemaintenance script and confirm that all DBs are where you'd like them

2. Do your maintenance

3. Before running Stop-Databasemaintenance script set all your DBs to circular logging (don't forget to restart the IS) - this will flush the logs.

4. Bring your server out of maintenance.

April 30th, 2015 11:52am

Yes I actually I always run StartDagServerMaintenance scripts before closing the server down.  And then StopDagServermaintenance

The dag servers are split up one in each location.  The quorum stays online so the majority vote is preserved and active server is still running.  

The trouble again, like I said before, is everything mounts fine.  But what if for some reason I cannot decrease the copy queue and it keeps saying Failed and Suspended?  Is it then safe to just delete the passive copy and start over again?  I know it is risky because I'm left with just one copy and the seeding of a new database and logs could take a while but I'm just saying what if.  

Also as you know, you cannot flush transaction logs on the active server after a full backup if the replication isn't working and the queue is "Failed and Suspended".  Is it then possible to just delete the passive copy of the database before running a full backup?  Would that flush all transaction logs?


  • Edited by khunkao Thursday, April 30, 2015 1:00 PM
Free Windows Admin Tool Kit Click here and download it now
April 30th, 2015 12:56pm

Yes I actually I always run StartDagServerMaintenance scripts before closing the server down.  And then StopDagServermaintenance

The dag servers are split up one in each location.  The quorum stays online so the majority vote is preserved and active server is still running.  

The trouble again, like I said before, is everything mounts fine.  But what if for some reason I cannot decrease the copy queue and it keeps saying Failed and Suspended?  Is it then safe to just delete the passive copy and start over again?  I know it is risky because I'm left with just one copy and the seeding of a new database and logs could take a while but I'm just saying what if.  

Also as you know, you cannot flush transaction logs on the active server after a full backup if the replication isn't working and the queue is "Failed and Suspended".  Is it then possible to just delete the passive copy of the database before running a full backup?  Would that flush all transaction logs?


  • Edited by khunkao Thursday, April 30, 2015 1:00 PM
April 30th, 2015 12:56pm

If you don't know 100% what it is, you shouldn't be giving advice.

Circular logging prevents you from recovering from a backup in the event of a failure of either the database or logs.  If you're taking backups, then you shouldn't be using circular logging unless you're okay with the risk of data loss, and that risk would be highest when you're performing the kind of maintenance that's the subject of this thread.

If you're not running backups, then you should be using circular logging or have some other mechanism for truncating the logs.

Free Windows Admin Tool Kit Click here and download it now
May 1st, 2015 12:41pm

I've seen the "Failed and Suspended" status and recovered using the Update-MailboxDatabaseCopy command below. For example:

PS] C:\>Get-MailboxDatabaseCopyStatus DB1\EX13-2 | fl Name,Status,CopyQueueLength,ReplayQueueLength,ContentIndexState

Name                 : DB1\EX13-2
Status               : FailedAndSuspended
CopyQueueLength      : 3
ReplayQueueLength   : 0
ContentIndexState    : Failed

We then update (or reseed) the failed database copy with the data from a good copy:

[PS] C:\>Update-MailboxDatabaseCopy DB1\EX13-2 -DeleteExistingFiles

I understand the reseed process can be lengthy but I'm not sure there are better (supported) me

May 1st, 2015 3:33pm

Hi,

As mentioned above, we may be better for implementing full back up first. Then run command as David provided.
Note: please run Suspend-MailboxDatabasecopy -identity DB1\EX13-2" then run Update-MailboxDatabaseCopy DB1\EX13-2 -DeleteExistingFiles .

Thanks

Free Windows Admin Tool Kit Click here and download it now
May 5th, 2015 4:17am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics