Exchange DAG DR failover and manual failback - DR CASs are not happy.
We had an unexpected failover of our DAG to a secondary datacenter (we forgot to set the activation block on the DR DAG nodes), and once the situation was resolved in the primary datacenter we manually failed back the DAG back. During the failover to our primary datacenter, the Exchange management pack in SCOM started generating all sorts of alerts regarding the primary site CASs not being able to acces ActiveSync and a few other web protocols which we didn't think anything of at the time since it was an unexpected failover. However after the manual failback (which was really just redistributing the database back to their primary owner), the secondary site CASs started generating all sorts of similar alerts about not being able to access those web protocls. To my knowledge prior to this failover and failback, we have not had the secondary site CASs complain about the web protocols not working. I chose to focus on one of the web protocols by selecting the ActiveSync errors, and ran the Test-ActiveSyncConnectivity command against one of the secondary site CASs, and this is what I got back: RunspaceId : 278ef843-8eeb-4654-8dbf-81d1c4a812ec LocalSite : DRSITE SecureAccess : True VirtualDirectoryName : Url : UrlType : Unknown Port : 0 ConnectionType : Plaintext ClientAccessServerShortName : DRCAS1 LocalSiteShortName : DRSITE ClientAccessServer : DRCAS1.company.com Scenario : Reset Credentials ScenarioDescription : Reset automated credentials for the Client Access Probing Task user on Mailbox server PRODDN1.company.com. PerformanceCounterName : Result : Failure Error : [Microsoft.Exchange.Monitoring.CasHealthStorageErrorException]: An error occurred while trying to access mailbox PRODDN1.company.com, on behalf of user company.com\extest_e2048c50283a2 Additional information: [Microsoft.Exchange.Data.Storage.WrongServerException]: The user and the mailbox are in different Active Directory sites.. UserName : extest_e2048c50283a2 StartTime : 7/20/2012 3:13:00 PM Latency : 00:00:00.0156001 EventType : Error LatencyInMillisecondsString : Identity : IsValid : True WARNING: No Client Access servers were tested. I don't understand why the secondary site CASs are still giving this error when the manual failback was over 12 hours ago. Technically the user has existed in all AD sites for over 1/2 a year now. I don't see anything unusual in the Application or System logs on the secondary site CAS I ran the command above on. Anyone have any ideas on how to make the secondary site CASs snap out of whatever delusion they are in?
July 20th, 2012 5:33pm

I think what that's telling you is that the secondary site CAS is in a different AD site from the mailbox server on which the mailbox is activated, which is expected and normal. I agree that the message isn't the best.Ed Crowley MVP "There are seldom good technological solutions to behavioral problems."
Free Windows Admin Tool Kit Click here and download it now
July 20th, 2012 5:59pm

Thanks for responding Ed. The concern at the moment is we never had SCOM generate erros on these Test-Whatever CAS cmdlets for our secondary site CASs, so why now and how do we get it to stop? For whatever reason I was under the impression that this was working fine before the failover and failback given the way it is acting now. If you or anyone else has any ideas, I would sure appreciate hearing it.
July 20th, 2012 9:58pm

You could configure an override in SCOM to ignore the message.Ed Crowley MVP "There are seldom good technological solutions to behavioral problems."
Free Windows Admin Tool Kit Click here and download it now
July 21st, 2012 3:16am

We thought about that, but we would have to target the specific CASs in the secondary datacenter each with their own individual with the SCOM override, as we wouldn't want the monitor stopped on the CASs in the primary datacenter, and we would prefer not to do that as it seems to be too much of a one off for us when this was never an issue for us in the past. I.E. We are trying to find out why now all of a sudden after a DAG failover and a failback that the secondary datacenter CASs are just now raising these alarms through SCOM.
July 21st, 2012 7:42am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics