HeartBeat Alerts during the Specific time but no failed to connect to the computer

Hello Sir,

Recently we had a issues that we are only receiving Heartbeat alerts in SCOM server from its agents during specific time in the weekdays.(New and closed alerts). but not failed to connect to the servers.

Our environment is (SCOM 2012R2). I have checked in the Event log and find below errors.

3 Server in Management pool and one DB server.

31411
Stopping group membership calculation rule:
Subscription ID: d0cf0387-af5d-4416-9e10-134ac7347d0e

15001
More than half of the members of the pool have acknowledged the most recent initialization check request. The pool member will send a lease request to acquire ownership of managed objects assigned to the pool.


15002

The pool member cannot send a lease request to acquire ownership of managed objects assigned to the pool because half or fewer members of the pool acknowledged the most recent initialization check request. The pool member will continue to send an initialization check request.

15003 and 15004

The pool member no longer owns any managed objects assigned to the pool because half or fewer members of the pool have acknowledged the most recent lease request. The pool member has unloaded the workflows for managed objects it previously owned.

31551

Failed to store data in the Data Warehouse. The operation will be retried.

Exception 'SqlException': Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.

One or more workflows were affected by this.

21406

The process started at 21:26:07 failed to create System.PropertyBagData. Errors found in output:

One more error: Resource entity is not heartbeating.

Could you please suggest me how to fix this issue and do the needful.

Regards,
Sanjeev Kumar N

January 5th, 2015 2:59pm

Hi,

Based on my understanding, heartbeat monitor checks the availability of each System Center Management Health Service in the management group. And computer not reachable monitor indicates if a computer can be pinged.

In some instances you only get the heartbeat failure if there is something wrong with the agent. In the case where you get both you can argue that if the computer is down and unavailable there is definately something wrong with the agent as well (it is not running if the machine is not running). However there are also environments where you will not get pings from server because of firewall settings (on agent machine or through network devices).

Regards,

Yan Li

Free Windows Admin Tool Kit Click here and download it now
January 8th, 2015 12:21pm

Hello Yan Li,

Thanks for replying.

Communication between management server and agents are fine. But we are getting Heartbeat alerts from all the agents at specific time (Both new and closed alerts) in 20 mins of time.

When I checked in SCOM servers we have found above event id which was generated at that time.

Could you please help me in this regard!

Regards,
Sanjeev Kumar N

 

January 8th, 2015 3:39pm

Hi Sanjeev,

Based on my understanding the "Failed to connect to computer alert" will appear post healthservice heartbeat failure only of the RMS / MS is not able to ping the Agent post 3 - 4 heart beat failures.

Also may be there was some network connectivity issue bit would have been restored within 3 - 4 minutes so you would not have got the Failed to connect to computer Alert. Or may be the SCOM service stopped on those agents but it was pingable so you would have not got the Failed to connect to computer alert.

Check the below Excellent article by MS as it gives it detailed with a photo.

http://technet.microsoft.com/library/hh212798.aspx

Also check the ovverrides for a safer side if it is disabled by any one or not as by default is is diabled for Client os but not for server OS.

By default, alerts for missed heartbeats and response to ping are disabled for client operating systems. To receive alerts for client operating systems, override the Health Service Heartbeat Failure and Computer Not Reachable monitors for the class Windows Client Operating System to set the Generates Alert parameter to True.

So can you confirm if this issue is on a server o/s or clie

Free Windows Admin Tool Kit Click here and download it now
January 10th, 2015 8:11pm

Hello Gautam,

Scom env is 2012 R2

Thanks for your support! We have cleared cache of Health folder on RMS Server then the alerts got controlled for 2 days, again now we are only getting "Closed Heartbeat failure alerts" during Specific time in night (IST hours ).

Previously we use to get New and Closed alerts, but now we are  receiving only "Closed Heartbeat alert" emails.

Could you please provide us the work around how to get rid of this issue.

Regards,
Sanjeev Kumar N

January 14th, 2015 9:12am

Hi Sanjeev,

Is this Closed alerts issue on both Console and Email or is it only on Email ?

If only email then can you also let me know how many email subscriptions do you have ?

Also is it only the Healthservice heart beat alerts you received as closed? What about others do you get closed alerts for other alerts as well ?

Free Windows Admin Tool Kit Click here and download it now
January 14th, 2015 10:51am

Hello Gautam,

We got only closed alerts on both console  as well as in Emails.

we have checked there is no network glitch during the alerts are generation time (2.45AM IST to 3.00AM IST). Because I have restarted the scom services on all the SCOM servers 15th January 2015 that day we have not received nor scom generated any of the  alerts, but from day after again it started and we are receiving new and closed heartbeat alerts from all the agents during the above specified time.

Not sure where to how to trouble shoot to fix the issue.

Regards,
Sanjeev Kumar N

January 19th, 2015 1:35pm

Hi Sanjeev,

So you mean that you stopped the SCOM Services on those agents on 15th Jan 2015. Then there was no alert for Healthservice heart beat failure and Failed to connect to computer ?

When you started the SCOM services back the next day you were getting the Alerts for the stopped ones for yesterday.

Is the above situation correct ?

Free Windows Admin Tool Kit Click here and download it now
January 19th, 2015 2:10pm

Hello Gautam

I mean restarted SCOM services on all Management servers (not on agents), for that day we have not received the alerts on a specific time from agents. from the next day onwards the same old story.. we are receiving alerts as usual from all the agents at the same time.

Regards,
Sanjeev Kumar N

January 19th, 2015 3:17pm

Hi Sanjeev,

Please read the below thread with a similar issue.


https://social.technet.microsoft.com/Forums/systemcenter/en-US/939776d7-d898-45b4-a529-554a2ae6bc15/health-service-heartbeat-failure-alert-for-generated-when-one-management-server-down?forum=operationsmanagergeneral

I also have few questions as well.

So you have a resource pool of 3 MS. So i would like to understand what are the total number of SCOM Agents you monitor and how have you assigned the agents in number ?

For example if you have 100 agents and you have assigned 1st MS 30, 2nd 30 and 3rd 40.

What is your setup as above ?

Also are all 3 management servers located in the same Data center ?

Free Windows Admin Tool Kit Click here and download it now
January 19th, 2015 7:20pm

Hi Sanjeev,

Any update on the issue ?

January 28th, 2015 10:59pm

Hi Sanjeev,

Any update on the issue ?

Free Windows Admin Tool Kit Click here and download it now
February 1st, 2015 8:04am

Please close the ticket. Issue got fixed.

Our scom servers are build on VM's. Seems like problem with the host. Recently performed activity on HOST machine and the issue got disappeared.

Regards,
Sanjeev Kumar N

February 5th, 2015 9:44am

Please close the ticket. Issue got fixed.

Our scom servers are build on VM's. Seems like problem with the host. Recently performed activity on HOST machine and the issue got disappeared.

Regards,
Sanjeev Kumar N

Free Windows Admin Tool Kit Click here and download it now
February 5th, 2015 5:43pm

Please close the ticket. Issue got fixed.

Our scom servers are build on VM's. Seems like problem with the host. Recently performed activity on HOST machine and the issue got disappeared.

Regards,
Sanjeev Kumar N

February 5th, 2015 5:43pm

Please close the ticket. Issue got fixed.

Our scom servers are build on VM's. Seems like problem with the host. Recently performed activity on HOST machine and the issue got disappeared.

Regards,
Sanjeev Kumar N

Free Windows Admin Tool Kit Click here and download it now
February 5th, 2015 5:43pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics