High Database Failure (Network Steve Forum)

High Database Failure

We are constantly getting "the Microsoft Exchange Information Store Database 'Name' copy on this server encountered a serious I/O error. A lost write was detected. And we check the logs to find nothing useful as to what the root cause of the issue is. So we are trying a few steps to make sure everything is setup right. We moved the offending server to it's own datastore We assured no backups, snapshots, etc are causing it and we even setup anew replication network too. Our setup is three mail servers, one at one site and two at another site. Any body have any insight as to the high rate of failure? Also is it good practice to turn off replication on the MapiDagNetwork and leave it true on the Replication Network?

April 10th, 2015 4:14pm

Disabling replication doesn't disable replication, it just says to use the other network for replication. So if you enable replication over the replication network and disable it over the MAPI network, replication should take place over the replication network only unless that network fails, in which case it will use other networks. But that just masks your problem. We might have a better idea had you posted the complete error messages verbatim, but based on what you've said it looks to me as if your problem is with storage.

Free Windows Admin Tool Kit Click here and download it now

April 12th, 2015 3:43pm

Hi alfistheman,

Thank you for your question.

This Error event indicates that the database copy encountered a serious I/O error (lost flush), and that this error may have affected all copies of the database. We could run the following command to check mailbox database copy status:

Get-MailboxDatabaseCopyStatus server <Mailbox server Name>

If the status of database is not Mount or Healthy, we could run the following command on the specific database to update copy status:

Update-MailboxDatabaseCopy Identity mailboxservername\DBname

We could refer to the following link:

https://technet.microsoft.com/en-us/library/ff984980(v=exchg.141).aspx

If there are any questions regarding this issue, please be free to let me know.

Best Regard,

Jim

April 12th, 2015 10:47pm

So we have three database servers two on our main network and one at another site for DR. I find this error when trying to resume a failed and suspended db at our DR site. This is an issue since it is a large 300GB+ database and we have been having to reseed it countless times and this wastes all our bandwidth and takes a long time to complete. When we try to resume we get...

[PS] F:\DB01>Resume-MailboxDatabaseCopy DB01\MAIL1

A server-side administrative operation has failed. The database copy could not be resumed because of a previous error

that is preventing the resume operation. Error: At '4/14/2015 12:40:35 AM' the Microsoft Exchange Information Store

Database 'DB01' copy on this server encountered a serious I/O error. A lost write was detected. Consult the event

log on the server for "ExchangeStoreDb" or "MSExchangeRepl" events that may contain more specific information about

the failure.

[Database: 89049019-5e66-4028-8637-3791c8f2085e, Server: MAIL1.INC.com]

+ CategoryInfo : NotSpecified: (:) [Resume-MailboxDatabaseCopy], ReplayServiceResumeBlockedException

+ FullyQualifiedErrorId : [Server=MAIL1,RequestId=b06dc975-693b-458a-a822-5b5a1753a503,TimeStamp=4/14/2015 12:3

8:29 PM] [FailureCategory=Cmdlet-ReplayServiceResumeBlockedException] 37B29166,Microsoft.Exchange.Management.Syste

mConfigurationTasks.ResumeDatabaseCopy

+ PSComputerName : mail1.inc.com

I have tried using the esetuil /r command to replay the logs since it tell me it was a Dirty Shutdown. After that command it puts it in a state of clean shutdown but the resume still fails.

Free Windows Admin Tool Kit Click here and download it now

April 14th, 2015 12:44pm

That copy needs to be reseeded via Update-MailboxDatabaseCopy. If that fails, the disk needs to be replaced, and then the copy should be reseeded.

April 15th, 2015 3:23pm

If your active copy of the database has become corrupted with a lost flush, you'll need to move all of the mailboxes out of that database.

Free Windows Admin Tool Kit Click here and download it now

April 17th, 2015 11:52am

If your active copy of the database has become corrupted with a lost flush, you'll need to move all of the mailboxes out of that database.

Marked as answer by jim-xuModerator Tuesday, April 21, 2015 1:26 AM
Unmarked as answer by alfistheman 10 hours 35 minutes ago

April 17th, 2015 3:50pm

If your active copy of the database has become corrupted with a lost flush, you'll need to move all of the mailboxes out of that database.

Marked as answer by jim-xuModerator Tuesday, April 21, 2015 1:26 AM
Unmarked as answer by alfistheman Wednesday, April 22, 2015 8:52 PM

Free Windows Admin Tool Kit Click here and download it now

April 17th, 2015 3:50pm

I understand that I have to reseed the database after this happens. But explain to me this why is that I have five databases and it only happens to two of them. All DBs are using the same disk, the same replication network. And it happens ever day or so to at least one of the two. I then spend a day reseeding it over the WAN. Why is that I have to reseed when I have 98% of the database that is still good? I don't have time or bandwidth to keep reseeding these databases everyday they go down which is basically everyday. This negatively affects my ability to be ready for a DR situation. So why, why do I have I/O errors all the time on two dbs but not on the other ones?

April 22nd, 2015 4:58pm

This may be file system corruption. The recovery is to format (or replace) the volume and reseed all of the database copies it hosts.

Free Windows Admin Tool Kit Click here and download it now

April 22nd, 2015 6:27pm

Ya I have done this with a couple of them and it still ends up failing right away. This is quite disappointing considering I was hoping for dual site recovery. Now I have I three databases that fail on a daily basis due to some I/O error after I reseed them. I thought Microsoft Exchange would have something a little more resilient at this stage of the game, but I guess not.

May 20th, 2015 11:19am

If you've replaced the disk and are still seeing I/O errors, you should investigate your disk controller. Exchange can't fix broken hardware, but only work around it.

Free Windows Admin Tool Kit Click here and download it now

May 20th, 2015 2:19pm

And maybe Abram won't say it, but I can be the fanboy here. Exchange 2013 is just about bullet-proof, so if its continually throwing I/O errors, then the hardware needs some serious inspection.

May 20th, 2015 2:30pm

I agree. DAG replication is really solid, even in Exchange 2010.

Free Windows Admin Tool Kit Click here and download it now

May 20th, 2015 2:34pm

It sounds like the active copy of the database became corrupted. Given that the passives became byte for byte copies when you seeded them, they are now corrupt too. You need to evacuate the databases.

May 20th, 2015 6:59pm

This topic is archived. No further replies will be accepted.