scom 2007 r2 - Health Service will not keep running
Good morning everyone!
I have an issue with a HEALTH SERVICE that will not keep running. This server is in an environment that uses a gateway. The Gateway
has a cert on it and communicates back to the other mgt servers. All other servers in that environment are reporting fine through the gateway. What I have tried so far is:
Renaming the HEALTH SERVICE STATE folder and starting the service backCompletely uninstalling and re-installing the SCOM 2007 R2 agent
Still the server's health service fails. The events after the restart are like this:
EVENT 102
HealthService (260) Health Service Store: The database engine (6.01.7600.0000) started a new instance (0).
EVENT 300
HealthService (260) Health Service Store: The database engine is initiating recovery steps.
EVENT 301
HealthService (260) Health Service Store: The database engine has begun replaying logfile C:\Program Files\System Center Operations
Manager 2007\Health Service State\Health Service Store\edb.log.
EVENT 302
HealthService (260) Health Service Store: The database engine has successfully completed recovery steps.
EVENT 2011
The Health Service did not find any policy in Active Directory
EVENT 20063
Active Directory Integration has been disabled for management group THINK.
EVENT 21022
No certificate was specified.
This Health Service will not be able to communicate with other health services unless those health services are in a domain that has a trust relationship with this domain.
If this health service needs to communicate with health services in untrusted domains, please configure a certificate.
EVENT 2002
Management Group "THINK" was started.
EVENT 21024
OpsMgr's configuration may be out-of-date for management group THINK, and has requested updated configuration from the Configuration Service.
The current(out-of-date) state cookie is "AE 97 2D A2 B5 66 44 25 80 84 D6 A8 E9 71 4B 72 25 A0 6E 6F
EVENT 7006
The Health Service has published the public key [28 BB 96 33 BC 6D ED 81 47 1E 64 C7 F3 37 CC CE ] used to send it secure messages to management
group THINK. This message only indicates that the key is scheduled for delivery, not that delivery has been confirmed.
EVENT 7026
The Health Service successfully logged on the RunAs account DOMAIN.LOCAL\MSQ_SVC for management group THINK
EVENT 7019
The Health Service has validated all RunAs accounts for management group THINK.
EVENT 10113
Taking a New Global Snapshot.
EVENT 10111
Deleting Global Snapshot.
Rinse and repeat.
This loop happens again a few times and then just stops.
Service is stopped.
The only error I see on the GATEWAY server is:
EVENT 20022
The health service {F5705ECC-68BE-FEDF-3AE5-0104F2BB4097} running on host SERVERNAME.DOMAIN.LOCAL and serving management group THINK with id {4DDD8BE6-13B1-4C15-AC46-247B39C0C16A}
is not heartbeating.
I am not sure what this is.
Any ideas would be greatly appreciated.
Kevin
August 16th, 2012 10:13am
Besides opeation manager event error, any error log in system/applicaiton log when health service is stopped.
Roger
Free Windows Admin Tool Kit Click here and download it now
August 16th, 2012 11:31am
Have you tried to uninstall and reinstall the agent?Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/ If my response was helpful, please mark it as so, if it answered your question, then please also mark it accordingly. Thank you.
August 16th, 2012 11:32am
Blake...Yes sir...I have uninstalled and reinstalled.
Free Windows Admin Tool Kit Click here and download it now
August 16th, 2012 11:36am
Hopeless,
I see nothing in the APPLICATION log...but in the SYSTEM log i get the following:
EVENT 7036
The System Center Management service entered the running state.
EVENT 7036
The WMI Performance Adapter service entered the running state.
EVENT 7031
The System Center Management service terminated unexpectedly.
It has done this 1 time(s). The following corrective action will be taken in 60000 milliseconds: Restart the service.
EVENT 7036
The WMI Performance Adapter service entered the stopped state.
This repeats until the agent fails 6 times and stops.
Kevin
August 16th, 2012 11:42am
Sorry, I skimmed right past that. So you have AV exclusions on the agent files on this server right? That, in my situation, and a corrupt agent install, were the only two causes of SCOM agents not working properly.
Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/ If my response was helpful, please mark it as so, if it answered your question, then please also mark it accordingly. Thank you.
Free Windows Admin Tool Kit Click here and download it now
August 16th, 2012 11:56am
Blake, this server does not have AV on it.
Kevin
August 16th, 2012 12:05pm
Ok, so have you run the cleanmom tool on it, then tried to do an agent reinstall? I think the cleanmom tool can be found in the newer SCOM R2 Admin resource kit.Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/ If my response was helpful, please mark it as so, if it answered your question, then please also mark it accordingly. Thank you.
Free Windows Admin Tool Kit Click here and download it now
August 16th, 2012 12:21pm
Ok, so have you run the cleanmom tool on it, then tried to do an agent reinstall? I think the cleanmom tool can be found in the newer SCOM R2 Admin resource kit.Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/ If my response was helpful, please mark it as so, if it answered your question, then please also mark it accordingly. Thank you.
August 16th, 2012 12:22pm
Thanks Blake...I had forgotten about CLEANMOM...let me try that and i will report back...
Free Windows Admin Tool Kit Click here and download it now
August 16th, 2012 3:06pm
Ok Blake...I tried CLEANMOM and then a re-install and the alert closed at 528pm. HOWEVER, shortly after that at 5:42pm...the service failed again and started the same loop in my original post.
:(
What the heck could be wrong here?
Kev
August 16th, 2012 8:36pm
Have you checked the flux capacitor yet? I dunno dude, I'm out of free support here. Time to call PSS?Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/ If my response was helpful, please mark it as so, if it answered your question, then please also mark it accordingly. Thank you.
Free Windows Admin Tool Kit Click here and download it now
August 16th, 2012 10:06pm
Please try the following steps to reinstall scom agent after you uninstall agent, 1. use cleanmom 2. make sure that scom folder is deleted 3. delete the entry in MS 4. reinstall the agent 5. approve the manual installed agent from MS roger
August 17th, 2012 4:44am
Thanks for the reply hopeless...I tried your process (Even restarted the RMS HS and the GW HS). Still does the same thing...
Stupid flux capacitor...guess I am opening a PSS ticket today!
:)
Kevin
Free Windows Admin Tool Kit Click here and download it now
August 17th, 2012 11:01am
I also cant see what would cause your agent to stop from the thread. Have you checked that the agent you install gets the latest and greatest cumulative update possible for the SCOM version you are running? SCOM 2007 R2 CU6.Bob Cornelissen - BICTT (My Blog about SCOM) - MVP 2012 and Microsoft Community Contributor 2011 Recipient
August 18th, 2012 8:14am
We are running SCOM 2007 r2 CU3.
Kevin
Free Windows Admin Tool Kit Click here and download it now
August 20th, 2012 11:00am
Hi Kevin, Would propose to check if updating SCOM + the agent to a higher CU level (would propose 5 or 6 actually), will help in your case. There have been a lot of things fixed in cu4+5+6. Can not guarantee that it will fix this exact problem with this
exact machine. But you know the drill. When in doubt or calling PSS they will usually ask if you are up to date with the whole stuff.Bob Cornelissen - BICTT (My Blog about SCOM) - MVP 2012 and Microsoft Community Contributor 2011 Recipient
August 20th, 2012 1:06pm
Hi Kevin, Would propose to check if updating SCOM + the agent to a higher CU level (would propose 5 or 6 actually), will help in your case. There have been a lot of things fixed in cu4+5+6. Can not guarantee that it will fix this exact problem with this
exact machine. But you know the drill. When in doubt or calling PSS they will usually ask if you are up to date with the whole stuff.Bob Cornelissen - BICTT (My Blog about SCOM) - MVP 2012 and Microsoft Community Contributor 2011 Recipient
Free Windows Admin Tool Kit Click here and download it now
August 20th, 2012 1:08pm
As it turns out, there was a baseline alert that was causing this. Once we disabled via override, the issue stopped. (PSS Found this with captures). The alert was disabled by default and overriden to turn this on for the server in question.
Still could be related to being on CU3. I am taking it to the change board to upgrade to CU6.
Thanks for the help everyone!
Kev
September 6th, 2012 10:12am