Cannot failover cluster

Hi all,

In my environment I have 2 exchange 2013 servers : ex01 & ex02 in 1 DAG01. Recently my ex02 server has problem, some exchange service crash ... I'm still finding reason, maybe because lack of memory ... BTW I have a "Mailbox Database 01":

Get-MailboxDatabaseCopyStatus "Mailbox Database 01"
Name                      Status     ContentIndexState
Mailbox Database 01\EX01  Mounted        Healthy
Mailbox Database 01\EX02  Healthy        Healthy

When ex02 has problem, something happen with "Mailbox Database 01" copy on both ex01 & ex02, it cannot be mounted on boths. The status of them keep switching : Mounted , Mounting , Initializing , Disconnected... like this:

Get-MailboxDatabaseCopyStatus "Mailbox Database 01"
Name                      Status     ContentIndexState
Mailbox Database 01\EX01  Mounting        Failed
Mailbox Database 01\EX02  Initializing    Failed

Get-MailboxDatabaseCopyStatus "Mailbox Database 01"
Name                      Status     ContentIndexState
Mailbox Database 01\EX01  Initializing        Failed
Mailbox Database 01\EX02  Mounting            Failed

Get-MailboxDatabaseCopyStatus "Mailbox Database 01"
Name                      Status     ContentIndexState
Mailbox Database 01\EX01  Disconected        Failed
Mailbox Database 01\EX02  Mounting           Failed

Until I suspend "Mailbox Database 01" copy on ex02, "Mailbox Database 01" will mount on ex01 successfully, then "Mailbox Database 01" edb file on ex02 has "dirty shutdown" state, and I have to reseed "Mailbox Database 01" from ex01 to ex02 manually.

Get-ClusterGroup -Cluster EX01
Name              OwnerNode            State
ClusterGroup         ex01              Online
Available Storage    ex02              Offline

Get-DatabaseAvailabilityGroup -Status -Identity DAG01 | fl name,primaryActiveManager

Name                 : DAG01
PrimaryActiveManager : EX01

Please let me know if you need any information.
Thanks for your help.
Jack.


  • Edited by Jack Chuong Saturday, March 28, 2015 5:41 AM
March 28th, 2015 5:40am

Hi John,

Last time, after re-seed the database status changed to healthy on EX02. I have to note that my DAG can failover before, ex: "Mailbox database 01" can be "mounted" on EX01 , "healthy" on EX02 or vice versa and when one server is down (I unplug network cable for example), the mailbox database copy on the other server is mounted automatically, when down server back to online , the mailbox database copy on it is resynchronized from the other automatically. Like I said it has just happened recently.

Recently, even when "Mailbox Database 01" is mounted on EX01 , "healthy" on EX02

Get-MailboxDatabaseCopyStatus "Mailbox Database 01"
Name                      Status     ContentIndexState
Mailbox Database 01\EX01  Mounted        Healthy
Mailbox Database 01\EX02  Healthy        Healthy

When EX02 has problem, "Mailbox Database 01" copy cannot be mounted on boths, its status keep switching...

During problem I have noticed that there are events appear only on EX02, relating to some exchange programs/services are crashed , ex lastime:

Faulting application name: MSExchangeHMWorker.exe, version: 15.0.712.0, time stamp: 0x5199cd1a
Faulting module name: RPCRT4.dll, version: 6.1.7601.21855, time stamp: 0x4eb4c921
Exception code: 0xc0020043
Fault offset: 0x000000000008aa13
Faulting process id: 0x5cfc
Faulting application start time: 0x01d0067aa7da6ac7
Faulting application path: C:\Program Files\Microsoft\Exchange Server\V15\Bin\MSExchangeHMWorker.exe
Faulting module path: C:\Windows\system32\RPCRT4.dll
Report Id: 432d5040-cdfe-11e4-9bb7-3440b58d323f

Faulting application name: svchost.exe_RpcEptMapper, version: 6.1.7600.16385, time stamp: 0x4a5bc3c1
Faulting module name: ntdll.dll, version: 6.1.7601.17725, time stamp: 0x4ec4aa8e
Exception code: 0xc0000374
Fault offset: 0x00000000000c40f2
Faulting process id: 0x2c0
Faulting application start time: 0x01ce90e16804a9e2
Faulting application path: C:\Windows\system32\svchost.exe
Faulting module path: C:\Windows\SYSTEM32\ntdll.dll

[RpcHttp] An internal server error occurred. The unhandled exception was: System.TypeInitializationException: The type initializer for 'Microsoft.Exchange.Data.Directory.Globals' threw an exception. ---> System.Runtime.InteropServices.COMException: Call was canceled by the message filter. (Exception from HRESULT: 0x80010002 (RPC_E_CALL_CANCELED))
   at System.Runtime.InteropServices.Marshal.ThrowExceptionForHRInternal(Int32 errorCode, IntPtr errorInfo)
   at System.Management.ManagementScope.InitializeGuts(Object o)
   at System.Management.ManagementScope.Initialize()
   at System.Management.ManagementObjectSearcher.Initialize()
   at System.Management.ManagementObjectSearcher.Get()
   at Microsoft.Exchange.Data.Directory.Globals.DetectIfMachineIsVirtualMachine()
   at Microsoft.Exchange.Data.Directory.Globals..cctor()

Watson report about to be sent for process id: 19380, with parameters: E12IIS, c-RTL-AMD64, 15.00.0712.024, w3wp#MSExchangeRpcProxyAppPool, M.E.Data.Directory, M.E.D.D.Globals.DetectIfMachineIsVirtualMachine, System.TypeInitializationException, 55de, 15.00.0712.016.
ErrorReportingEnabled: False
 
[RpcHttp] An internal server error occurred. The unhandled exception was: System.TypeInitializationException: The type initializer for 'Microsoft.Exchange.Data.Directory.Globals' threw an exception. ---> System.Runtime.InteropServices.COMException: Call was canceled by the message filter. (Exception from HRESULT: 0x80010002 (RPC_E_CALL_CANCELED))
   at System.Runtime.InteropServices.Marshal.ThrowExceptionForHRInternal(Int32 errorCode, IntPtr errorInfo)
   at System.Management.ManagementScope.InitializeGuts(Object o)
   at System.Management.ManagementScope.Initialize()
   at System.Management.ManagementObjectSearcher.Initialize()
   at System.Management.ManagementObjectSearcher.Get()
   at Microsoft.Exchange.Data.Directory.Globals.DetectIfMachineIsVirtualMachine()
   at Microsoft.Exchange.Data.Directory.Globals..cctor()

and this time, I found one event appear only on EX02 before problem happened :

Log Name:      Application
Source:        MSExchange Transport Migration
Date:          3/28/2015 10:34:01 AM
Event ID:      2005
Task Category: General
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      EX02.mydomain.com
Description:
An unexpected failure has occurred. The problem was ignored but may indicate other problems in the system. Diagnostic information:

  at Microsoft.Exchange.Data.Storage.MapiPropertyBag.SaveChanges(Boolean force)    at Microsoft.Exchange.Data.Storage.StoreObjectPropertyBag.SaveChanges(Boolean force)    at Microsoft.Exchange.Data.Storage.AcrPropertyBag.SaveChanges(Boolean force)    at Microsoft.Exchange.Data.Storage.CoreItem.InternalSave(SaveMode saveMode, CallbackContext callbackContext)    at Microsoft.Exchange.Data.Storage.Item.SaveInternal(SaveMode saveMode, Boolean commit)    at Microsoft.Exchange.Data.Storage.Item.Save(SaveMode saveMode)    at Microsoft.Exchange.Migration.MigrationJob.UpdatePoisonCount(IMigrationDataProvider provider, Int32 count)    at Microsoft.Exchange.MailboxReplicationService.CommonUtils.ProcessKnownExceptions(Action actionDelegate, FailureDelegate failureDelegate)|Error clearing posion count for job: local move 3:14a96ad4-4506-4ab8-9be7-c0308a72e555:ExchangeLocalMove:Staged:4:Administrator@itlvn.com:Completed:11/26/2013 7:59:11 PM::|Microsoft.Exchange.Data.Storage.ConnectionFailedTransientException|Cannot save changes made to an item to store.|InnerException:MapiExceptionNetworkError:16.55847:3E000000, 18.59943:BE060000BD07000000000000, 0.62184:00000000, 255.16280:BE0600006E2F610000000000, 255.8600:A81D0000, 255.12696:802C8F080869D001000FC882, 255.10648:02000000, 255.14744:BE060000, 255.9624:F2030000, 255.13720:00000000, 255.11672:01000000, 255.12952:00000000010700C000000000, 3.23260:BE060000, 0.43249:000FC882, 4.39153:15010480, 4.32881:15010480, 0.50035:07000000, 4.64625:15010480, 20.52176:000FC88211001010FE000000, 20.50032:000FC8827E17401076040000, 0.50128:00000000, 0.50288:00000000, 4.23354:15010480, 0.25913:76040000, 255.21817:15010480, 0.17361:76040000, 4.19665:15010480, 0.37632:76040000, 4.37888:15010480|Microsoft.Mapi.MapiExceptionNetworkError: MapiExceptionNetworkError: Unable to save changes. (hr=0x80040115, ec=0) Diagnostic context:   Lid: 55847  EMSMDBPOOL.EcPoolSessionDoRpc called [length=62]   Lid: 59943  EMSMDBPOOL.EcPoolSessionDoRpc exception [rpc_status=0x6BE][latency=1981]   Lid: 62184    Lid: 16280  dwParam: 0x0 Msg: EEInfo: ComputerName: n/a   Lid: 8600  dwParam: 0x0 Msg: EEInfo: ProcessID: 7592   Lid: 12696  dwParam: 0x0 Msg: EEInfo: Generation Time: 3/28/0415 3:34:01 AM   Lid: 10648  dwParam: 0x0 Msg: EEInfo: Generating component: 2   Lid: 14744  dwParam: 0x0 Msg: EEInfo: Status: 1726   Lid: 9624  dwParam: 0x0 Msg: EEInfo: Detection location: 1010   Lid: 13720  dwParam: 0x0 Msg: EEInfo: Flags: 0   Lid: 11672  dwParam: 0x0 Msg: EEInfo: NumberOfParameters: 1   Lid: 12952  dwParam: 0x0 Msg: EEInfo: prm[0]: Long val: 3221227265   Lid: 23260  Win32Error: 0x6BE   Lid: 43249    Lid: 39153  StoreEc: 0x80040115   Lid: 32881  StoreEc: 0x80040115   Lid: 50035    Lid: 64625  StoreEc: 0x80040115   Lid: 52176  ClientVersion: 15.0.712.17   Lid: 50032  ServerVersion: 15.0.712.6014   Lid: 50128    Lid: 50288    Lid: 23354  StoreEc: 0x80040115   Lid: 25913    Lid: 21817  ROP Failure: 0x80040115   Lid: 17361    Lid: 19665  StoreEc: 0x80040115   Lid: 37632    Lid: 37888  StoreEc: 0x80040115    at Microsoft.Mapi.MapiExceptionHelper.InternalThrowIfErrorOrWarning(String message, Int32 hresult, Boolean allowWarnings, Int32 ec, DiagnosticContext diagCtx, Exception innerException)    at Microsoft.Mapi.MapiExceptionHelper.ThrowIfError(String message, Int32 hresult, IExInterface iUnknown, Exception innerException)    at Microsoft.Mapi.MapiProp.SaveChanges(SaveChangesFlags flags)    at Microsoft.Exchange.Data.Storage.MapiPropertyBag.SaveChanges(Boolean force)|: ,,
%2
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="MSExchange Transport Migration" />
    <EventID Qualifiers="49152">2005</EventID>
    <Level>2</Level>
    <Task>1</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2015-03-28T03:34:01.000000000Z" />
    <EventRecordID>10409597</EventRecordID>
    <Channel>Application</Channel>
    <Computer>IDCEXC002.itl.com</Computer>
    <Security />
  </System>
  <EventData>
    <Data>  at Microsoft.Exchange.Data.Storage.MapiPropertyBag.SaveChanges(Boolean force)    at Microsoft.Exchange.Data.Storage.StoreObjectPropertyBag.SaveChanges(Boolean force)    at Microsoft.Exchange.Data.Storage.AcrPropertyBag.SaveChanges(Boolean force)    at Microsoft.Exchange.Data.Storage.CoreItem.InternalSave(SaveMode saveMode, CallbackContext callbackContext)    at Microsoft.Exchange.Data.Storage.Item.SaveInternal(SaveMode saveMode, Boolean commit)    at Microsoft.Exchange.Data.Storage.Item.Save(SaveMode saveMode)    at Microsoft.Exchange.Migration.MigrationJob.UpdatePoisonCount(IMigrationDataProvider provider, Int32 count)    at Microsoft.Exchange.MailboxReplicationService.CommonUtils.ProcessKnownExceptions(Action actionDelegate, FailureDelegate failureDelegate)|Error clearing posion count for job: local move 3:14a96ad4-4506-4ab8-9be7-c0308a72e555:ExchangeLocalMove:Staged:4:Administrator@itlvn.com:Completed:11/26/2013 7:59:11 PM::|Microsoft.Exchange.Data.Storage.ConnectionFailedTransientException|Cannot save changes made to an item to store.|InnerException:MapiExceptionNetworkError:16.55847:3E000000, 18.59943:BE060000BD07000000000000, 0.62184:00000000, 255.16280:BE0600006E2F610000000000, 255.8600:A81D0000, 255.12696:802C8F080869D001000FC882, 255.10648:02000000, 255.14744:BE060000, 255.9624:F2030000, 255.13720:00000000, 255.11672:01000000, 255.12952:00000000010700C000000000, 3.23260:BE060000, 0.43249:000FC882, 4.39153:15010480, 4.32881:15010480, 0.50035:07000000, 4.64625:15010480, 20.52176:000FC88211001010FE000000, 20.50032:000FC8827E17401076040000, 0.50128:00000000, 0.50288:00000000, 4.23354:15010480, 0.25913:76040000, 255.21817:15010480, 0.17361:76040000, 4.19665:15010480, 0.37632:76040000, 4.37888:15010480|Microsoft.Mapi.MapiExceptionNetworkError: MapiExceptionNetworkError: Unable to save changes. (hr=0x80040115, ec=0) Diagnostic context:   Lid: 55847  EMSMDBPOOL.EcPoolSessionDoRpc called [length=62]   Lid: 59943  EMSMDBPOOL.EcPoolSessionDoRpc exception [rpc_status=0x6BE][latency=1981]   Lid: 62184    Lid: 16280  dwParam: 0x0 Msg: EEInfo: ComputerName: n/a   Lid: 8600  dwParam: 0x0 Msg: EEInfo: ProcessID: 7592   Lid: 12696  dwParam: 0x0 Msg: EEInfo: Generation Time: 3/28/0415 3:34:01 AM   Lid: 10648  dwParam: 0x0 Msg: EEInfo: Generating component: 2   Lid: 14744  dwParam: 0x0 Msg: EEInfo: Status: 1726   Lid: 9624  dwParam: 0x0 Msg: EEInfo: Detection location: 1010   Lid: 13720  dwParam: 0x0 Msg: EEInfo: Flags: 0   Lid: 11672  dwParam: 0x0 Msg: EEInfo: NumberOfParameters: 1   Lid: 12952  dwParam: 0x0 Msg: EEInfo: prm[0]: Long val: 3221227265   Lid: 23260  Win32Error: 0x6BE   Lid: 43249    Lid: 39153  StoreEc: 0x80040115   Lid: 32881  StoreEc: 0x80040115   Lid: 50035    Lid: 64625  StoreEc: 0x80040115   Lid: 52176  ClientVersion: 15.0.712.17   Lid: 50032  ServerVersion: 15.0.712.6014   Lid: 50128    Lid: 50288    Lid: 23354  StoreEc: 0x80040115   Lid: 25913    Lid: 21817  ROP Failure: 0x80040115   Lid: 17361    Lid: 19665  StoreEc: 0x80040115   Lid: 37632    Lid: 37888  StoreEc: 0x80040115    at Microsoft.Mapi.MapiExceptionHelper.InternalThrowIfErrorOrWarning(String message, Int32 hresult, Boolean allowWarnings, Int32 ec, DiagnosticContext diagCtx, Exception innerException)    at Microsoft.Mapi.MapiExceptionHelper.ThrowIfError(String message, Int32 hresult, IExInterface iUnknown, Exception innerException)    at Microsoft.Mapi.MapiProp.SaveChanges(SaveChangesFlags flags)    at Microsoft.Exchange.Data.Storage.MapiPropertyBag.SaveChanges(Boolean force)|: ,,</Data>
  </EventData>
</Event>

I also noticed that "Mailbox Database 01" edb file on EX02 has "dirty shutdown" state (yes, last time too)

eseutil /mh path to Mailbox Database 01 edb file on EX02 and result :
State: Dirty Shutdown
Log Required: 4656967-4657077 (0x470f47-0x470fb5)
Log Committed: 0-4657078 (0x0-0x470fb6)
I'm going to reseed "Mailbox Database 01" copy on EX02 tonight, should I repair it with eseutil /r to get it into clean shutdown state before ?
How can I check if my DAG configuration is fine ? Is result from Get-Clustergroup before ok ?


Free Windows Admin Tool Kit Click here and download it now
March 28th, 2015 8:40am

I guess the problem is with the store.exe (worker process) for that database. you can try re-seed and check the status first. no need to bring the database to clean shutdown before re-seed coz re-seed will delete and create a new EDB and Log files.

If still you are seeing the same problem then i would suggest to create a new DB on both the servers and add a copy on other servers and check the behaviour because in exchange 2013 we will have separate store.exe for each DB. In this way we can narrow down whether the problem is with store.exe or some other..

March 28th, 2015 1:10pm

Hi John,
I upgraded memory on server ex01 & ex02 yesterday to exclude "lack of memory" problem in future, and I figured out that failover didn't work, here things happened :

Step 1 : I run commands on server ex02  to get it into maintenance mode, following instruction here : http://blogs.technet.com/b/nawar/archive/2014/03/30/exchange-2013-maintenance-mode.aspx

1. Drain active mail queues on the mailbox server 
Set-ServerComponentState EX02 -Component HubTransport -State Draining -Requester Maintenance
2. To help transport services immediately pick the state change run: 
Restart-Service MSExchangeTransport 
Restart-Service MSExchangeFrontEndTransport
3. To redirect messages pending delivery in the local queues to another Mailbox server run: 
Redirect-Message -Server EX02 Target EX01
4. To prevents the node from being and becoming the PAM, pause the cluster node by running 
Suspend-ClusterNode EX02
5. To move all active databases currently hosted on the DAG member to other DAG members, run 
Set-MailboxServer EX02 -DatabaseCopyActivationDisabledAndMoveNow $True
6. Get the status of the existing database copy auto activation policy, run the following and note the value of DatabaseCopyAutoActivationPolicy, we will need this when taking the server out of Maintenance in the future 
To prevent the server from hosting active database copies, run 
Set-MailboxServer EX02 -DatabaseCopyAutoActivationPolicy Blocked
7. To put the server in maintenance mode run: 
Set-ServerComponentState EX02 -Component ServerWideOffline -State Inactive -Requester Maintenance

After run above commands, I see that server ex02 in maintenance mode:

[PS] C:\Windows\system32>Get-ServerComponentState ex02 | ft Component,State -Autosize

Component                   State
---------                   -----
ServerWideOffline          Inactive
HubTransport               Inactive
FrontendTransport          Inactive
Monitoring                 Active
RecoveryActionsEnabled     Active
AutoDiscoverProxy          Inactive
ActiveSyncProxy            Inactive
EcpProxy                   Inactive
EwsProxy                   Inactive
ImapProxy                  Inactive
OabProxy                   Inactive
OwaProxy                   Inactive
PopProxy                   Inactive
PushNotificationsProxy     Inactive
RpsProxy                   Inactive
RwsProxy                   Inactive
RpcProxy                   Inactive
UMCallRouter               Inactive
XropProxy                  Inactive
HttpProxyAvailabilityGroup Inactive
ForwardSyncDeamon          Inactive
ProvisioningRps            Inactive
MapiProxy                  Inactive

But when I get "Mailbox Database 01" copy status

Get-MailboxDatabaseCopyStatus "Mailbox Database 01"
Name                      Status     ContentIndexState
Mailbox Database 01\EX01  Mounted         Healthy
Mailbox Database 01\EX02  Healthy         Healthy

Is that normal ?

Step 2 : I shutdown server ex02 to upgrade RAM, after server ex02 off, I get "Mailbox Database 01" copy status

Get-MailboxDatabaseCopyStatus "Mailbox Database 01"
Name                      Status     ContentIndexState
Mailbox Database 01\EX01  Mounted         Healthy
Mailbox Database 01\EX02  ServiceDown     Unknown

Everything is fine, afew seconds later, "Mailbox Database 01" copy on server ex01 is dismounted (server ex01 is primaryActiveManager as I showed you previous replies), I get "Mailbox Database 01" copy status again

Get-MailboxDatabaseCopyStatus "Mailbox Database 01"
Name                      Status     ContentIndexState
Mailbox Database 01\EX01  Dismounted      Failed
Mailbox Database 01\EX02  ServiceDown     Unknown

I tried to mount "Mailbox Database 01" copy on server ex01 but it didn't work

Mount-Database -Identity "Mailbox Database 01"
Couldn't mount the database that you specified. Specified database: Mailbox Database 01; Error code: An Active Manager
operation failed. Error: An Active Manager operation encountered an error. To perform this operation, the server must
be a member of a database availability group, and the database availability group must have quorum. Error: Active
Manager encountered an error while trying to access the cluster database. [Server: EX01.mydomain.com].
    + CategoryInfo          : InvalidOperation: (Mailbox Database 01:ADObjectId) [Mount-Database], InvalidOperationExc
   eption
    + FullyQualifiedErrorId : [Server=EX01,RequestId=719edf90-3a8a-47e1-a85a-0e37902f0253,TimeStamp=3/29/2015 4:3
   3:26 AM] 9BA9C40F,Microsoft.Exchange.Management.SystemConfigurationTasks.MountDatabase
    + PSComputerName        : ex01.mydomain.com
A process is holding onto a transport performance counter. processId : 2888, counter : time in resource per second Value=0 SpinLock=0 Lifetime=Type:...

So my user experienced "service down" for about 10 mins, after I power on server ex02 (when ex02 startup it is still in maintenance mode) , "Mailbox Database 01" copy on server ex01 is mounted and healthy automatically, everything back to normal. I run commands on server ex02 to get it out of maintenance mode

Set-ServerComponentState EX02 -Component ServerWideOffline -State Active -Requester Maintenance
Resume-ClusterNode EX02
Set-MailboxServer EX02 -DatabaseCopyActivationDisabledAndMoveNow $False
Set-MailboxServer EX02 -DatabaseCopyAutoActivationPolicy Unrestricted
Set-ServerComponentState EX02 -Component HubTransport -State Active -Requester Maintenance
Restart-Service MSExchangeTransport 
Restart-Service MSExchangeFrontEndTransport

I active "Mailbox Database 01" copy on server ex02, I do same things with server ex01 to upgrade RAM and same things happen.

My witness server is Windows 7 Pro 32bit, joined domain, Witness directory: C:\Witness , "Exchange Trusted Subsystem" is added into local administrators group in witness server

I don't know what is wrong with my DAG.


Free Windows Admin Tool Kit Click here and download it now
March 29th, 2015 11:52pm

after you shutdown EX02, what is the event you are seeing in EX01. Can you makesure PAM is there in EX01 before shutdown EX02 ?
March 30th, 2015 3:08am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics