Alerts not generated for agent heartbeat failure
I have a SCOM SP1 fresh install on Server 2008 x64 with a clustered RMS. Alerts are not being generated when a server goes offline or the health service heartbeat fails. To my knowledge I have not altered any alerts for this type of action. The agent will be shown as grey in the Agents Managed section but no alerts are generated in the Active Alerts section for the server showing the server is offline. I am a member of the Ops Mgr Administrators so I have full access to the system. The heartbeat interval is set for every 60 seconds and the number of missed heartbeats allowed is 5. I remember in testing/dev before our deployment, we were using SCOM without SP1 and these alerts would generate automatically. What do I need to check, change, or do so alerts are generated for the heartbeat failures? This issue deals with alerts being generated and nothing to do with email notification. I have the notifications configured but they cant be sent unless an alert is generated. All other alerting is working without issues and this is a rather large issue. We have no way to automatically be notified when an agent stops reporting for around 500 servers. This would help greatly with outages.This seems to only be an issue on a clustered RMS on Server 2008! The clustered services are set to Use Network Name as well.The event log on the RMS is logging 20022 events in the Operations Manager log proving that the RMS is recieving the events from the clients. This was a fresh install and no overrides were present and I am the only engineer with access. However when I went to the following monitors Health Service Watcher > Entity Health > Availability > Computer Not ReachableHealth Service Watcher > Entity Health > Availability > Health Service Heartbeat Failurethat there is a override in place that sets the value for generating alarms to FALSE. This override cant be deleted as it is part of the sealed MS MP.I created an enforced override that sets the value to true but the alarms are still not generated. Why would this value be disabled by default? I am on the only engineer that has access to make major changes on the system and this hasnt function properly since the original install. I have posted this in other forums and have yet to find a solution. This needs resolved as it is ridiculous that a monitoring system isnt able to simply monitor if a server is up or down!!!!! Any help greatly appreciated.
June 15th, 2009 12:00pm

I believe we have an override for client computer not to raise alert. Is that instance for which you have a problem?Marius Sutara My MSDN blog This posting is provided "AS IS" with no warranties, and confers no rights. Use of attachments are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm
Free Windows Admin Tool Kit Click here and download it now
June 15th, 2009 2:01pm

This is dealing with servers not client XP/Vista machines. No heartbeat failure alarms are being generated for servers.
June 16th, 2009 11:06am

could you please investigate where is this override comming from and let me know?Marius Sutara My MSDN blog This posting is provided "AS IS" with no warranties, and confers no rights. Use of attachments are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm
Free Windows Admin Tool Kit Click here and download it now
June 16th, 2009 3:52pm

<<<<This seems to only be an issue on a clustered RMS on Server 2008!>>>>Hi Makarius,Have u tried un-installing the Agent manually on the problem server. Then delete the server from the Agent Managed listunder Administration,and then doing a Discovery again to re-install the agent from the console??Cheers,John Bradshaw
June 16th, 2009 4:18pm

I'm doing this from memory because I had a similar issue on a clustered 2003 RMS. Check the Parameters tab on your RMS Network Name cluster resource and make sure that DNS Registration Must Succeed and Enable Kerberos Authentication are checked. (See step 25 http://technet.microsoft.com/en-us/library/dd789024.aspx). Oh, you have to fail it over and back again to take effect too.
Free Windows Admin Tool Kit Click here and download it now
June 16th, 2009 6:48pm

It came in as part of the management pack and cannot be removed. It has been this way since initial install and I have done nothing to creat this as I have not messed with the Health Watcher Service. As I stated we had a testing environment that wasn't clustered and with a fresh SP1 install and the alarms were generated. We have a fresh SP1 install with a Server 2008 cluster and no alarms have ever been generated.
June 17th, 2009 2:59pm

I have tried that and this isn't a problem server here and there.... its every single server.
Free Windows Admin Tool Kit Click here and download it now
June 17th, 2009 3:03pm

That option isn't listed anywhere in the GUI with a server 2008 cluster however those two are listed with a status of OK.
June 17th, 2009 3:07pm

sorry for not being clear, I wanted to know the name of the MP that contains this overrideMarius Sutara My MSDN blog This posting is provided "AS IS" with no warranties, and confers no rights. Use of attachments are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm
Free Windows Admin Tool Kit Click here and download it now
June 17th, 2009 3:10pm

When viewing the override summary the name listed is Managed Computer Client Health Service Watcher Group and I believe is the SystemCenter MP.When I try to delete the override I recieve the following:Note: The following information was gathered when the operation was attempted. The information may appear cryptic but provides context for the error. The application will continue to run. : Cannot modify sealed Management Pack. [ID=Microsoft.SystemCenter.2007, KeyToken=31bf3856ad364e35].
June 17th, 2009 3:30pm

so this boils down to the fact that your server is discovered and added as member of the managed computer client.1. authoring2. groups3. select "Managed Computer Client Health Service Watcher Group"4. right click -> view membersif you claim that computer for which you are not receiving alert is not a client computer, I will try to search if there were any known issue with populating this group with server computer ...Marius Sutara My MSDN blog This posting is provided "AS IS" with no warranties, and confers no rights. Use of attachments are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm
Free Windows Admin Tool Kit Click here and download it now
June 17th, 2009 4:02pm

I checked that section and there are no memebers listed at all!This was listed as the dynamic member definition: <Expression> <Contains> <MonitoringClass>$MPElement[Name="Microsoft.SystemCenter.HealthService"]$</MonitoringClass> <Expression> <Contained> <MonitoringClass>$MPElement[Name="Microsoft.SystemCenter.ManagedComputerClient"]$</MonitoringClass> </Contained> </Expression> </Contains> </Expression> Also there were the overrides I mentioned before as well as one for Configuration 90 Minutes OUt of Date, Configuration 45 Minutes Out of Date, Configuration Processing, and Health Service Configuration State Health which were were all disabled due to the the sealed MPCorrect this is not a client computer. We haven't deployed the agent to any XP/Vista machines. The only deployments have been to Server 2003/2008. And I am not recieving alarms for any server when the agent heartbeat fails. Not a single one has ever occured since the installation.
June 17th, 2009 4:14pm

In my experience the resources will be online but none of the health service watcher monitors will work if DNS Registration Must Succeed and Enable Kerberos Authentication aren't checked in the RMS network name resource. Is there a network name resource for your RMS? In the properties of that resource look for DNS Registration Must Succeed and Enable Kerberos Authentication. I don't have access to a 2008 cluster, can someone provide the steps of where this is located in 2008?
Free Windows Admin Tool Kit Click here and download it now
June 17th, 2009 4:49pm

I noticed that Event ID 20022 was appearing in the Event Log on the RMS. Seeing that you do not appear to have found a resolution to your issue, I will 'step outside the box' and explain what I did to fix Event ID 20022 (which may help you look at the issue from a different perspective): ****************************************************************Event Type:ErrorEvent Source:OpsMgr ConnectorEvent Category:NoneEvent ID:20022Date:2/3/2009Time:2:51:36 PMUser:N/AComputer:TK2STGWBA01Description:The health service {F0FB2004-7BEF-EE09-C4DF-9192C5415430} running on host SAIPSDNS06.phx.gbl and serving management group SCOM_TK2_PHX_TestLabs_1 with id {230CED27-EE5E-2E1B-E0A8-C87089A8FBB0} is not heartbeating.* Several hundred of these - one for every agent showing 'Not Monitored' - were dropped in the Ops Mgr event log on the RMS.To fix: 1. Checked the account being used for the Default Action Account. It was stale or incorrect. I deleted the offending accounts and created a secondary action account, then assigned the RMS and MS boxes to this secondary action account.2. Used the Ops Mgr Console to 'Repair Agent' for anything appearing as 'Not Monitored'.3. Verified the Ops Mgr services on the RMS and MS were starting and using the specified <Domain>\<Account>.****************************************************************
June 19th, 2009 7:05pm

I applied the following hotfix and the alarms started to be generated. I have no idea how it fixed things and was not my intention when applying the hotfix but it did the trick. http://support.microsoft.com/kb/959865
Free Windows Admin Tool Kit Click here and download it now
June 29th, 2009 11:33am

I know this is an old thread, but I'm having exactly the same issue... Overrides exist as part of a sealed management pack and those overrides set Alerts to Disabled....I would like to receive these alerts.. I didn't see any actual resolution to this other that applying a CU that I have already applied.... Any help would be greatly appreciated. Also, this brings up the question about "which Override Wins?" I tried Overriding the Override and I still don't get alerts....
August 26th, 2012 5:29pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics