Server Down Gray Agent - SCOM 2012 R2

Hello all,

I am completely new to SCOM and evaluating it. I've deployed the agents and set up monitors, from what I can tell, but when a server goes offline it does not trigger any alerts and instead just changes the status to gray. I understand that this is because the heartbeat is no longer communicating, however shouldn't it be considered an alert when a server is down? Can someone help point me to what I am missing. I want to raise alerts when my servers go offline and I thought that this was the default when using agents. Thanks guys.

-Kevin

September 14th, 2015 4:26pm

By default, heartbeat Alert is disabled for operations manager Managed Computer client health serviceWatcher group.
1) check whether your server is in the operations manager Managed Computer client health serviceWatcher group
2) If yes, use override to enable alert
Roger
Free Windows Admin Tool Kit Click here and download it now
September 14th, 2015 10:46pm

Hi Kevin

You are correct - SCOM should generate an alert when a windows server is down. There are 2 specific alerts, both of which as ENABLED by default:

- Health Service Heartbeat Failure

- Failed to Connect to Computer

They are discussed in detail here:

http://blogs.technet.com/b/jonathanalmquist/archive/2010/01/11/health-service-heartbeat-failure-diagnostics-and-recoveries.aspx

Just a quick check on terminology - is there definitely no alert in the alerts view? Perhaps start the server again, wait for health state to go green and then shut it down and check to see if you get the above alerts. If you close these alerts then they will disappear and not re-appear again as the alert is generated by a monitor:

http://www.culham.net/scom/scom-rules-vs-monitors/

If you are looking at notifications e.g. an email when the server goes down then you have a couple of options. To check everything works, I'd create a subscription for the specific monitors mentioned above. In the long term, you probably want to create groups of health service watcher objects - this is the object that actually fires the alert (it isn't the "server" itself).  

http://blogs.technet.com/b/kevinholman/archive/2014/04/09/creating-groups-of-health-service-watcher-objects-based-on-other-groups.aspx

Cheers

Graham

September 15th, 2015 3:09am