RMS does not close alerts

Hello,

I have to reboot the RMS this morning after the patching and I noticed a lot of alerts "Health Service Heartbeat Failure" created during this time. This looks okay as most of the servers were not in Maintenance Mode.

But why after the RMS came up none of these alerts closed by themselves, they are set "Auto-Resolved Alert" "True"

This is happening more and more often that monitors are not closing by themselves.

If I do a "Reset Health" it will be green and stay until next failure ... 

But as I have hundreds of alerts the reset takes for ever!!!

I tried to close all of them at once and run "Machine Green" but it does not clean the monitor on each items... the alert is not showing but the monitor is still red on the server health monitors screen through health explorer...

what is hanging?

Thanks,

December 16th, 2014 8:23pm

?bump?
Free Windows Admin Tool Kit Click here and download it now
December 17th, 2014 9:38pm

Hi,

Will you please restart system center management configuration service and data access service on the RMS server? In addition, please also check the operation manager event logs for more details about the issue.

Regards,

Yan Li

December 18th, 2014 4:46am

Hi,

Will you please restart system center management configuration service and data access service on the RMS server? In addition, please also check the operation manager event logs for more details about the issue.

Regards,

Free Windows Admin Tool Kit Click here and download it now
December 18th, 2014 4:51am

Hello,

Stop Health Service

Delete Health Service Folder

Start Health Service

==> Same issue

Reboot the RMS and MSs

==> Same issue

Any clue?

Thanks,
Dom

December 19th, 2014 6:32pm

Hi,

If you don't want to recieve those heartbeat failure after rebooting RMS, you may try below workarround:

1. Find the health service heartbeat failure monitor, open its properies.

2. Under Diagnostic and Recovery tab, Click Edit the Ping Computer on Heartbeat Failure diagnostic task.

3. Under Overrides tab, choose override the task.

4. Set the Interval Milliseconds value above 32767.

After reboot, you may change it back. Or you maybe not able to recieve heartbeat alert.

In addition, please also check your operation manager event logs and check is there any errors regarding to SCOM health.

Regards,

Yan Li

Free Windows Admin Tool Kit Click here and download it now
December 23rd, 2014 3:08pm

still in progress as multiple issues came and it is the holiday season

Happy Holidays

Dom

December 25th, 2014 3:58am

Hello,

I do not see any value "Interval" on the monitor ..

Any clue?

Thanks,

DOm

Free Windows Admin Tool Kit Click here and download it now
December 26th, 2014 9:18pm

Hi,

I am using SCOM 2012, and after we change the Interval parameter, we may encouter other errors, but we can change it back to solve the issue. After we change the interval parameter, we will not be able to recieve heartbeat failure alerts.

If this is not the way you want to try, we may need to get more information about the issue to troubleshoot it. Have you checked operation manager event logs?

Regards,

Yan Li

December 29th, 2014 11:32am

Hello,

The problem is there is no interval value on the heartbeat failure monitor...

Yes the operationmanager event log has many events:

5500

20050

31504

Thanks,
Dom

Free Windows Admin Tool Kit Click here and download it now
December 29th, 2014 11:41pm

Do you use SCOM 2007 or 2007 R2? I only have SCOM 2012, and checked the heartbeat faulire monitor, the diagnostic task is as below picture:

Will you please try below article:

http://support.microsoft.com/kb/942866

December 30th, 2014 5:29am

Hi There,

Just wanted to confirm this. Is there any override set in the Healthservice heart beat failure monitor saying it should  not resolve itsself ? Like the setting in the below for the same.

Can you check if the parameter of Auto-resolve Alert if it is set to false by mistake by any one ?

Free Windows Admin Tool Kit Click here and download it now
December 30th, 2014 11:21am

Dom,

Also i think you are looking at a wrong Healthservice heart beat monitor as mine is SCOM 2007 R2 CU4 and i am able to get so many options as per the screenshot on the above thread.

So my suggestion would be is you stop a health service of a agent and wait for the alert to appear on the console and once it appears on the bottom click on Edit properties of the Monitor and select override for all objects for the class of Health service watcher and see if that Auto-resolve agent's overrided value is set to True or False.

If it is False then change it to true.

Post me the results.

December 30th, 2014 12:21pm

Do you use SCOM 2007 or 2007 R2? I only have SCOM 2012, and checked the heartbeat faulire monitor, the diagnostic task is as below picture:

Will you please try below article:

http://support.microsoft.com/kb/9428

December 30th, 2014 6:06pm

Hi There,

Just wanted to confirm this. Is there any override set in the Healthservice heart beat failure monitor saying it should  not resolve itsself ? Like the setting in the below for the same.

Can you check if the parameter of Auto-resolve Alert if it is set to false by mistake by any one ?

December 30th, 2014 6:10pm

Dom,

Also i think you are looking at a wrong Healthservice heart beat monitor as mine is SCOM 2007 R2 CU4 and i am able to get so many options as per the screenshot on the above thread.

So my suggestion would be is you stop a health service of a agent and wait for the alert to appear on the console and once it appears on the bottom click on Edit properties of the Monitor and select override for all objects for the class of Health service watcher and see if that Auto-resolve agent's overrided value is set to True or False.

If it is False then change it to true.

Post me the re

Free Windows Admin Tool Kit Click here and download it now
December 30th, 2014 6:15pm

Hello,

As we got a power failure on 12/29/2014 I have tons of alerts "Health Service Heartbeat Failure" and "Failed to Connect to Computer"...

I tried the various methods  "Maintenance Mode" change the settings with overrides...

All alerts remain not closed...

I still could not get the screen with the Interval value...

Thanks,
Dom

December 30th, 2014 7:40pm

Hello,

I tried also Authoring > Monitors > Look for Heartbeat Failure

Still the same ONLY 3 fields available under override ... nothing about interval...

Thanks,

Free Windows Admin Tool Kit Click here and download it now
December 30th, 2014 7:48pm

Can you reset the whole management group by doing the below process. I am not sure if it may give a solution, But it is wort testing and checking.

During your non business hours.

Stop all the SCOM Services in the RMS & MS.

Post that go to the Program files location in the RMS & MS and in the folder where SCOM is installed. You will find a folder named Health service state and  you need to rename that folder to Health service state_Old and then start all the SCOM services on the RMS & MS.

Now post starting the Health service a new Health service state folder will be created with fresh data and the same will be submitted to all the agents in ur management group. So all the Rules, Monitores, Overrides etc created / which are already there will be re submitted to the agents. After 10 - 20 min the alerting will be back (Depending on your environment / Network connectivity).

Post doing this stop the Healthservice in one dummy agent and once you receive the Healthservice heart beat alert, Start the health service and see if it closes automatically.

Any corrupt configurations in the RMS, MS & Agents will be cleared by doing the above.

Test this and let me know the r

December 30th, 2014 8:02pm

Hello,

All services have been stopped on all MSs and RMS...

even all servers have been rebooted ...

The Rename of the folder was done as well for all MSs and the RMS previous to the stop/start and the reboots...

After three days with this long week-end nothing has changed Monitors are not closing by themselves.

I will do more tests again...

It seems to be chronic...\

Thanks,
Dom

Free Windows Admin Tool Kit Click here and download it now
January 20th, 2015 5:55pm

Hello,

All SCOM Services have been stopped on the RMS, MS1, MS2 and DMS1 

The folder "Health Service State" has been renamed on the 4 Management servers.

The services have been restarted...

Waiting ... 10/20 minutes for the rules to populate again... watching any event(s) showing the rules are picked... (Event ID 2110 is populating the logs now...)

HealthService (5784) Health Service Store: The database engine stopped the instance (0).

The System Center Management service entered the stopped state.

Yes the alert closed by itself...

What to do with all the hearbeat failures from the patching day done overnight !!! closing the alerts one by one ...

Thanks,

Dom

January 21st, 2015 8:05pm

Hi Dom,

Nice to hear we are making progress. I would say you close the alerts manually for the old ones and wait for a fresh Healthservice heartbeat failure to appear and check on that.

Or you can manually do it on one agent for testing by stopping it and post receiving the alert, start it and see if it resolves automatically.

Free Windows Admin Tool Kit Click here and download it now
January 21st, 2015 9:17pm

Same problem again .... this week-end all alerts remain open...

...

January 28th, 2015 1:57am

Hi Dom,

I saw few drawbacks with the alerting with SCOM 2007 R2. Just for a confirmation. If the alerts are still there in your Operations console.

Can you delete the SCOM Cache file and re open the console and check if the alerts are still there or disappeared ? As it has happened in my case that i do this always and it does the trick..

The cache file can be located on the below location.

Close the console and redirect to the following location and delete the file with the extension .MDB

C:\Users\Your login account\AppData\Local\Microsoft\Microsoft.Mom.UI.Console

If you do not close the console the file will not delete. So close the console and delete the file and re open the console and check.

Post me the results.

Free Windows Admin Tool Kit Click here and download it now
January 28th, 2015 7:21am

Hi Dom,

Any update on this issue ?

February 1st, 2015 8:04am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics