Managed Availability Responders

by Guest Post on March 2, 2015

in Exchange

Responders are the final critical part of Managed Availability. Recall that Probes are how Monitors obtain accurate information about the experience your users are receiving. Responders are what the Monitors use to attempt to fix the situation. Once they pass throttling, they launch a recovery action such as restarting a service, resetting an IIS app pool, or anything else the developers of Exchange have found often resolve the symptoms. Refer to the Responder Timeline section of the Managed Availability Monitors article for information about when the Responders are executed.

Definitions and Results

Just like Probes and Monitors, Responders have an event log channel for their definitions and another for their results. The definitions can be found in Microsoft-Exchange-ActiveMonitoring/ResponderDefinition. Some of the important properties are:

  • TypeName: The full code name of the recovery action that will be taken when this Responder executes.
  • Name: The name of the Responder.
  • ServiceName: The HealthSet this Responder is part of.
  • TargetResource: The object this Responder will act on.
  • AlertMask: The Monitor for this Responder.
  • ThrottlePolicyXml: How often this Responder is allowed to execute. I’ll go into more details in the next section.

The results can be found in Microsoft-Exchange-ActiveMonitoring/ResponderResult. Responders output a result on a recurring basis whether or not the Monitor indicates they should take a recovery action. If a ResponderResult event has a RecoveryResult of 2 and IsRecoveryAttempted of 1, the Responder attempted a recovery action. Usually, you will want to instead skip looking at the Responder results and go straight to Microsoft-Exchange-ManagedAvailability/RecoveryActionResults, but let’s first discuss the events in the Microsoft-Exchange-ManagedAvailability/RecoveryActionLogs event log channel.

Throttling

When a recovery action is attempted by a Responder, it is first checked against throttling limits. This will result in one of two events in the RecoveryActionLogs channel: 2050, throttling has allowed the operation, or 2051, throttling rejected the operation. Here’s a sample of a 2051 event:

throttlingevent

In the details, you will see:

ActionId

RestartService

ResourceName

MSExchangeRepl

RequesterName

ServiceHealthMSExchangeReplEndpointRestart

ExceptionMessage

Active Monitoring Recovery action failed. An operation was rejected during local throttling. (ActionId=RestartService, ResourceName=MSExchangeRepl, Requester=ServiceHealthMSExchangeReplEndpointRestart, FailedChecks=LocalMinimumMinutes, LocalMaxInDay)

LocalThrottleResult

<LocalThrottlingResult IsPassed="false" MinimumMinutes="60" TotalInOneHour="1" MaxAllowedInOneHour="-1" TotalInOneDay="1" MaxAllowedInOneDay="1" IsThrottlingInProgress="true" IsRecoveryInProgress="false" ChecksFailed="LocalMinimumMinutes, LocalMaxInDay" TimeToRetryAfter="2015-02-11T14:29:57.9448377-08:00"> <MostRecentEntry Requester="ServiceHealthMSExchangeReplEndpointRestart" StartTime="2015-02-10T14:29:55.9920032-08:00" EndTime="2015-02-10T14:29:57.9448377-08:00" State="Finished" Result="Succeeded" /> </LocalThrottlingResult>

GroupThrottleResult

<not attempted>

TotalServersInGroup

0

TotalServersInCompatibleVersion

0

Hopefully, you recognize the first few fields. This is the RestartService recovery action, which restarts a service. The ResourceName is used by the recovery action to pick a target; for the RestartService recovery action, it is the name of the service to restart. The RequesterName is the name of the Responder, as listed in the ResponderDefinition or ResponderResult channels.

The LocalThrottleResult property is more interesting. Recovery actions are throttled per server, where the same recovery action cannot run too often on the same server, and per group, where the same recovery action cannot run too often on the same DAG (for the Mailbox role) or AD site (for the Client Access role). If a value is -1, this level of throttling is not used; for example, MaxAllowedInOneHour is not interesting if only 1 action is allowed per day. In this example, the MSExchangeRepl resource was already the target of a recovery action within the last 60 minutes, and so the recovery action did not pass the LocalMinimumMinutes throttling. As this recovery action attempt was blocked by local throttling, the group throttling was not attempted. This table has a description of each of the limits mentioned in this event:

ThrottlingResult attribute

Local throttle config attribute name

Group throttle config attribute name

Description

IsPassed

   

True if throttling will allow the recovery action. Otherwise, false.

MinimumMinutes,

LocalMinimumMinutes,

GroupMinimumMinutes

LocalMinimumMinutesBetweenAttempts

GroupMinimumMinutesBetweenAttempts

The time that must elapse before this recovery action may act upon the same resource on this server or in this group.

TotalInOneHour

   

The number of times this recovery action has acted upon this resource on this server or in this group in the last hour.

MaxAllowedInOneHour,

LocalMaxInHour

LocalMaximumAllowedAttemptsInOneHour

n/a

The number of times this recovery action is allowed to act upon this resource on this server or in this group in one hour.

TotalInOneDay

   

The number of times this recovery action has acted upon this resource on this server or in this group in the last 24 hours.

MaxAllowedInOneDay,

LocalMaxInDay,

GroupMaxInDay

LocalMaximumAllowedAttemptsInADay

GroupMaximumAllowedAttemptsInADay

The number of times this recovery action is allowed to act upon this resource on this server or in this group in 24 hours.

IsRecoveryInProgress,

RecoveryInProgress,

GroupRecoveryInProgress

   

Whether this recovery action is already acting upon this resource and has not completed. If True, the new action will be aborted.

TimeToRetryAfter

   

The time after which this recovery action would be allowed to act on this resource on this server or in this group.

The GroupThrottleResult has the same fields, and also gives details about the recovery actions that have taken place on the other servers in the group.

If the action is not throttled, event 500 will be logged in the Microsoft-Exchange-ManagedAvailability/RecoveryActionResults channel, indicating that the recovery action is beginning. If it succeeds, event 501 is logged. This is the most common case and where you’ll usually want to start. These events also have details about the recovery action that was taken and the throttling it passed. Recovery actions that start and then fail are still counted against throttling limits. For more information about recovery actions, read the What Did Managed Availability Just Do to This Service? article.

Viewing Throttling Limits

So what is the best way to find out what recovery action throttling is in place? You could wait for the Responder to begin a recovery action and view the throttling settings in the RecoveryActionsLogs channel, but there are two places that will be more timely. The first is the Microsoft-Exchange-ManagedAvailability\ThrottlingConfig event log channel. The second is the Microsoft-Exchange-ActiveMonitoring/ResponderDefinition channel, introduced in the first section of this artcile. The advantage of the ThrottlingConfig channel is that you can see all the Responders that can take a particular recovery action grouped together, instead of having to check every Responder definition. Here’s a sample event from the ThrottlingConfig event log channel:

Identity

RestartService/Default/*/*/msexchangefastsearch

RecoveryActionId

RestartService

 

ResponderCategory

Default

 

ResponderTypeName

*

 

ResponderName

*

 

ResourceName

msexchangefastsearch

 

PropertiesXml

<ThrottleConfig Enabled="True" LocalMinimumMinutesBetweenAttempts="60" LocalMaximumAllowedAttemptsInOneHour="-1" LocalMaximumAllowedAttemptsInADay="4" GroupMinimumMinutesBetweenAttempts="-1" GroupMaximumAllowedAttemptsInADay="-1" />

 

The Identity of a throttling configuration is a concatenation of the next five fields, so let’s discuss each. The RecoveryActionId is the Responder’s throttling type. You can find this as the name of the ThrottleEntries node in the Responder definition’s ThrottlePolicyXml property. The ResponderCategory is unused and is always Default right now. The ResponderTypeName is the Responder’s TypeName property. The ResourceName is the object the Responder acts on. In this example, the throttling for Responders that use the RestartService recovery action to restart the MSExchangeFastSearch process are allowed on any server up to 4 times a day, as long as it has been 60 minutes since this recovery action has restarted it on that server. The group throttling is not used.

The second method to view throttling limits is by the Microsoft-Exchange-ActiveMonitoring/ResponderDefinition events. This will include any overrides you have in place. Here is the value of the ThrottlePolicyXml property from a ResponderDefinition event:

<ThrottleEntries> <RestartService ResourceName="MSExchangeFastSearch"> <ThrottleConfig Enabled="True" LocalMinimumMinutesBetweenAttempts="60" LocalMaximumAllowedAttemptsInOneHour="-1" LocalMaximumAllowedAttemptsInADay="4" GroupMinimumMinutesBetweenAttempts="-1" GroupMaximumAllowedAttemptsInADay="-1" /> </RestartService> </ThrottleEntries>

You can see that these attribute names and values match the ThrottlingConfig event’s PropertiesXml values.

Changing Throttling Limits

There may be times when you want recovery actions to occur more frequently or less frequently. For example, you have a customer report of an outage and you find that a service restart would have fixed it but was throttled, or you have a third-party application that does particularly poorly with application pool resets. To change the throttling configuration, you can use the same Add-ServerMonitoringOverride and Add-GlobalMonitoringOverride cmdlets that work for other Managed Availability overrides. The Customizing Managed Availability article gives a good summary on using these cmdlets. For the PropertyName parameter, the cmdlet supports a special syntax for modifying the throttling configuration. Instead of specifying the entire XML blob as the override (which will work, but will be harder to read later), you can use ThrottleAttributes.LocalMinimumMinutesBetweenAttempts, or the other properties, as the PropertyName. Here’s an example:

Add-GlobalMonitoringOverride -ItemType Responder -Identity Search\SearchIndexFailureRestartSearchService –PropertyName ThrottleAttributes.LocalMinimumMinutesBetweenAttempts -PropertyValue 240 -ApplyVersion "15.00.1044.025"

To only allow app pool resets by the ActiveSyncSelfTestRestartWebAppPool Responder every 2 hours instead of 1, you could use the command:

Add-GlobalMonitoringOverride -ItemType Responder -Identity ActiveSync.Protocol\ActiveSyncSelfTestRestartWebAppPool -PropertyName ThrottleAttributes.LocalMinimumMinutesBetweenAttempts -PropertyValue 120 -ApplyVersion “Version 15.0 (Build 1044.25)”

If you want you servers to reboot when the MSExchangeIS service crashes and cannot start at the rate of all of your servers once a day and no more often than one in the DAG every 60 minutes, you could use the commands:

Add-GlobalMonitoringOverride -ItemType Responder -Identity Store\StoreServiceKillServer -PropertyName ThrottleAttributes.GroupMinimumMinutesBetweenAttempts -PropertyValue 60 -ApplyVersion “15.00.1044.025”

Add-GlobalMonitoringOverride -ItemType Responder -Identity Store\StoreServiceKillServer -PropertyName ThrottleAttributes.GroupMaximumAllowedAttemptsInADay -PropertyValue -1 -ApplyVersion “15.00.1044.025”

The LocalMaximumAllowedAttemptsInADay value is already 1, so each server would still reboot at most once per day. If the override was entered correctly, the ResponderDefinition event’s ThrottlePolicyXml value will be updated, and there will be a new entry in the ThrottlingConfig channel.

These may be poor examples, but it is hard to pick good ones as the Exchange developers pick values for the throttling configuration based on our experience running Exchange in Office 365. We don’t expect that changing these values is going to be something you’ll want to do very often, but it is usually a better idea than disabling a monitor or a recovery action altogether. If you do have a scenario where you need to keep a throttling limit override in place, we would love to hear about it.

Abram Jackson
Program Manager, Exchange Server


Exchange Team Blog

{ Comments on this entry are closed }

This is the Huawei Watch

March 1, 2015

Huawei is about to make its presence felt at MWC with the announcement of the Huawei Watch. The watch itself was first spotted yesterday on a billboard, but we now have full promo videos for this Android Wear-powered beauty that will be made official in a few hours. Posted to the Huawei YouTube channel, two […]

Read the full article →

Webinar: SQL Server for Mission Critical Applications

March 1, 2015

Upgrading to the latest versions of SQL Server enables you to provide breakthrough performance, availability and manageability for your mission critical applications. Learn more about how SQL Server 2012 and 2014 can help run your tier-1, OLTP applications while maintaining lower costs, strong management and high security? Join our guest speakers, Forrester Research Principal Analyst […]

Read the full article →

Gmail now available in Myanmar (Burmese)

February 28, 2015

Posted by Brian Kemler, Product Manager In the summer of 2012 I travelled in Myanmar (Burma) and found inspiration in the temples of Bagan, the floating gardens of Inle Lake and the sweltering heat and teaming markets of Yangon. As a country of 53 million, Myanmar’s recent opening-up has triggered an explosion of people coming […]

Read the full article →

Configuring Regional and Language Settings Online in an MDT Task Sequence – Script Update

February 27, 2015

A few years ago I authored a post about this topic.  You can find it here.  I recently had an engagement where I had to do this again.  In the process of doing this I made a few changes/improvements to the script.  I’ve added the download for the new script here and updated the download […]

Read the full article →

Visa Announces Acquisition Of TrialPay To Expand Its Offers Platform

February 27, 2015

 Visa announced today that it’s planning to acquire Mountain View-based TrialPay, an e-commerce payment platform which sits in between payments and advertising, providing consumers with an alternative way to pay for items by agreeing to take advantage of another merchant’s offer. For example, in mobile applications, TrialPay allows gamers to unlock premium content by agreeing to… […]

Read the full article →

1,000 Chrome Experiments and counting…

February 27, 2015

In 2009, we launched Chrome Experiments to showcase the work of creative coders who pushed HTML5 and JavaScript to the limits in order to build beautiful, unique web experiences. At first, the site had only 19 experiments, but we hoped they would be a source of inspiration for programmers who made art with open web […]

Read the full article →

Machine Learning for the Business Intelligence developer

February 26, 2015

Amy (the other half of the data duo in our team) and I have been gate crashing the Data Culture series and other events recently to see who’s interested in Azure Machine Learning (MAML) . It turns out that data scientists are pretty comfortable using their own tools and scripts be that in R and […]

Read the full article →

Tails 1.3 released

February 26, 2015

Tails 1.3 has been released. Tails is a live system that aims to preserve your privacy and anonymity. It helps you to use the Internet anonymously and circumvent censorship almost anywhere you go and on any computer but leaving no trace unless you ask it to explicitly. It is a complete operating system designed to […]

Read the full article →

Email retention for deleted items in Office 365 is changing

February 25, 2015

If your organization has mailboxes in Office 365, you might want to know about this: there is a change coming to extended email retention period for deleted items in Office 365. Head over to Office Blogs and read all about it! Extended email retention for deleted items in Office 365 Nino Bilic Exchange Team Blog

Read the full article →