How do you troubleshoot when a SCOM agent health appears 'grey' and what does it mean ?
We have around 20 servers whose agent health is showing as grey, what does it mean ? and how should we troubleshoot to determine what is causing this state ? the agent service is running and server is pingable. I fear that the agents are
not reporting as they should be whilst in this state so keen to get it resolved as soon as possible.
I connected to one of the servers and it says this in the Ops event log (servername changed), could this be related ?.....
The process started at 8:30:08 PM was terminated because the HealthService requested the workflow to stop, some data may have been lost.
Command executed: "C:\WINDOWS\system32\cscript.exe" //nologo "C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 21\43509\AD_Monitor_Trusts.vbs" server1.xyz.com false false
Working Directory: C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 21\43509\
One or more workflows were affected by this.
Workflow name: AD_Monitor_Trusts.Monitor
June 30th, 2010 9:56pm
Hi
This to me is the best single guide to troubleshooting greyed out agents:
http://blogs.technet.com/b/kevinholman/archive/2009/10/01/fixing-troubled-agents.aspx
There is a known issue with the AD Trust Monitor scripts \ monitoring:
http://nocentdocent.wordpress.com/2009/10/04/ad-mp-ad-trust-monitoring-bug/
Good Luck
GrahamView OpsMgr tips and tricks at
http://systemcentersolutions.wordpress.com/
Free Windows Admin Tool Kit Click here and download it now
June 30th, 2010 10:01pm
Thanks Graham, the link is very good, I will try it out some of the suggestions and let you know how I get on. I hadn't mentioned previously but the agents are mutli-homed, you wouldn't happen to know if it's likely that this could cause it ?
for example if the other management group had a rogue rule or monitor ?
July 1st, 2010 12:58am
Hi
If something (rule \ monitor) causes problems for the agent then it will affect the agent in both Management Groups.
What errors do you see in the operationsmanager event log? Event 21023?
http://systemcentersolutions.wordpress.com/category/troubleshooting/21023-events/
Good Luck
GrahamView OpsMgr tips and tricks at
http://systemcentersolutions.wordpress.com/
Free Windows Admin Tool Kit Click here and download it now
July 1st, 2010 8:24am
Hi Graham
Sorry for delay. I am seeing a lot of event 21403 the following example but a couple of names changed....the process started at 'xyz' was terminated because the HealthService requested the workflow to stop, some data may have been lots. Command
executed: "C:\windows\system32\csript.exe" //nologo C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 25\7980\AD_Monitor_Trusts.vbs" server1.xyz.org false false.
Working Directory: C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 25\7980\
One or more workflows were affected by this.
Workflow name: AD_Monitor_Trusts.Monitor
July 12th, 2010 2:41pm
In case of event 1102, use the following solution:
Simply Log on to DC and run the following commands
hslockdown /L
you will see NT Authority\system is in denied state
Then run the command to bring it in allowed state
hslockdown /A "NT AUTHORITY\System"
Cheers
Saad
Free Windows Admin Tool Kit Click here and download it now
April 4th, 2011 8:11am