scom 2007 r2 - Health Service will not keep running
Good morning everyone! I have an issue with a HEALTH SERVICE that will not keep running. This server is in an environment that uses a gateway. The Gateway has a cert on it and communicates back to the other mgt servers. All other servers in that environment are reporting fine through the gateway. What I have tried so far is: Renaming the HEALTH SERVICE STATE folder and starting the service backCompletely uninstalling and re-installing the SCOM 2007 R2 agent Still the server's health service fails. The events after the restart are like this: EVENT 102 HealthService (260) Health Service Store: The database engine (6.01.7600.0000) started a new instance (0). EVENT 300 HealthService (260) Health Service Store: The database engine is initiating recovery steps. EVENT 301 HealthService (260) Health Service Store: The database engine has begun replaying logfile C:\Program Files\System Center Operations Manager 2007\Health Service State\Health Service Store\edb.log. EVENT 302 HealthService (260) Health Service Store: The database engine has successfully completed recovery steps. EVENT 2011 The Health Service did not find any policy in Active Directory EVENT 20063 Active Directory Integration has been disabled for management group THINK. EVENT 21022 No certificate was specified. This Health Service will not be able to communicate with other health services unless those health services are in a domain that has a trust relationship with this domain. If this health service needs to communicate with health services in untrusted domains, please configure a certificate. EVENT 2002 Management Group "THINK" was started. EVENT 21024 OpsMgr's configuration may be out-of-date for management group THINK, and has requested updated configuration from the Configuration Service. The current(out-of-date) state cookie is "AE 97 2D A2 B5 66 44 25 80 84 D6 A8 E9 71 4B 72 25 A0 6E 6F EVENT 7006 The Health Service has published the public key [28 BB 96 33 BC 6D ED 81 47 1E 64 C7 F3 37 CC CE ] used to send it secure messages to management group THINK. This message only indicates that the key is scheduled for delivery, not that delivery has been confirmed. EVENT 7026 The Health Service successfully logged on the RunAs account DOMAIN.LOCAL\MSQ_SVC for management group THINK EVENT 7019 The Health Service has validated all RunAs accounts for management group THINK. EVENT 10113 Taking a New Global Snapshot. EVENT 10111 Deleting Global Snapshot. Rinse and repeat. This loop happens again a few times and then just stops. Service is stopped. The only error I see on the GATEWAY server is: EVENT 20022 The health service {F5705ECC-68BE-FEDF-3AE5-0104F2BB4097} running on host SERVERNAME.DOMAIN.LOCAL and serving management group THINK with id {4DDD8BE6-13B1-4C15-AC46-247B39C0C16A} is not heartbeating. I am not sure what this is. Any ideas would be greatly appreciated. Kevin
August 16th, 2012 10:13am

Besides opeation manager event error, any error log in system/applicaiton log when health service is stopped. Roger
Free Windows Admin Tool Kit Click here and download it now
August 16th, 2012 11:31am

Have you tried to uninstall and reinstall the agent?Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/ If my response was helpful, please mark it as so, if it answered your question, then please also mark it accordingly. Thank you.
August 16th, 2012 11:32am

Blake...Yes sir...I have uninstalled and reinstalled.
Free Windows Admin Tool Kit Click here and download it now
August 16th, 2012 11:36am

Hopeless, I see nothing in the APPLICATION log...but in the SYSTEM log i get the following: EVENT 7036 The System Center Management service entered the running state. EVENT 7036 The WMI Performance Adapter service entered the running state. EVENT 7031 The System Center Management service terminated unexpectedly. It has done this 1 time(s). The following corrective action will be taken in 60000 milliseconds: Restart the service. EVENT 7036 The WMI Performance Adapter service entered the stopped state. This repeats until the agent fails 6 times and stops. Kevin
August 16th, 2012 11:42am

Sorry, I skimmed right past that. So you have AV exclusions on the agent files on this server right? That, in my situation, and a corrupt agent install, were the only two causes of SCOM agents not working properly. Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/ If my response was helpful, please mark it as so, if it answered your question, then please also mark it accordingly. Thank you.
Free Windows Admin Tool Kit Click here and download it now
August 16th, 2012 11:56am

Blake, this server does not have AV on it. Kevin
August 16th, 2012 12:05pm

Ok, so have you run the cleanmom tool on it, then tried to do an agent reinstall? I think the cleanmom tool can be found in the newer SCOM R2 Admin resource kit.Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/ If my response was helpful, please mark it as so, if it answered your question, then please also mark it accordingly. Thank you.
Free Windows Admin Tool Kit Click here and download it now
August 16th, 2012 12:21pm

Ok, so have you run the cleanmom tool on it, then tried to do an agent reinstall? I think the cleanmom tool can be found in the newer SCOM R2 Admin resource kit.Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/ If my response was helpful, please mark it as so, if it answered your question, then please also mark it accordingly. Thank you.
August 16th, 2012 12:22pm

Thanks Blake...I had forgotten about CLEANMOM...let me try that and i will report back...
Free Windows Admin Tool Kit Click here and download it now
August 16th, 2012 3:06pm

Ok Blake...I tried CLEANMOM and then a re-install and the alert closed at 528pm. HOWEVER, shortly after that at 5:42pm...the service failed again and started the same loop in my original post. :( What the heck could be wrong here? Kev
August 16th, 2012 8:36pm

Have you checked the flux capacitor yet? I dunno dude, I'm out of free support here. Time to call PSS?Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/ If my response was helpful, please mark it as so, if it answered your question, then please also mark it accordingly. Thank you.
Free Windows Admin Tool Kit Click here and download it now
August 16th, 2012 10:06pm

Please try the following steps to reinstall scom agent after you uninstall agent, 1. use cleanmom 2. make sure that scom folder is deleted 3. delete the entry in MS 4. reinstall the agent 5. approve the manual installed agent from MS roger
August 17th, 2012 4:44am

Thanks for the reply hopeless...I tried your process (Even restarted the RMS HS and the GW HS). Still does the same thing... Stupid flux capacitor...guess I am opening a PSS ticket today! :) Kevin
Free Windows Admin Tool Kit Click here and download it now
August 17th, 2012 11:01am

I also cant see what would cause your agent to stop from the thread. Have you checked that the agent you install gets the latest and greatest cumulative update possible for the SCOM version you are running? SCOM 2007 R2 CU6.Bob Cornelissen - BICTT (My Blog about SCOM) - MVP 2012 and Microsoft Community Contributor 2011 Recipient
August 18th, 2012 8:14am

We are running SCOM 2007 r2 CU3. Kevin
Free Windows Admin Tool Kit Click here and download it now
August 20th, 2012 11:00am

Hi Kevin, Would propose to check if updating SCOM + the agent to a higher CU level (would propose 5 or 6 actually), will help in your case. There have been a lot of things fixed in cu4+5+6. Can not guarantee that it will fix this exact problem with this exact machine. But you know the drill. When in doubt or calling PSS they will usually ask if you are up to date with the whole stuff.Bob Cornelissen - BICTT (My Blog about SCOM) - MVP 2012 and Microsoft Community Contributor 2011 Recipient
August 20th, 2012 1:06pm

Hi Kevin, Would propose to check if updating SCOM + the agent to a higher CU level (would propose 5 or 6 actually), will help in your case. There have been a lot of things fixed in cu4+5+6. Can not guarantee that it will fix this exact problem with this exact machine. But you know the drill. When in doubt or calling PSS they will usually ask if you are up to date with the whole stuff.Bob Cornelissen - BICTT (My Blog about SCOM) - MVP 2012 and Microsoft Community Contributor 2011 Recipient
Free Windows Admin Tool Kit Click here and download it now
August 20th, 2012 1:08pm

As it turns out, there was a baseline alert that was causing this. Once we disabled via override, the issue stopped. (PSS Found this with captures). The alert was disabled by default and overriden to turn this on for the server in question. Still could be related to being on CU3. I am taking it to the change board to upgrade to CU6. Thanks for the help everyone! Kev
September 6th, 2012 10:12am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics