SCOM 2012 Gray Health State of Network Devices
Hello.  Can someone help us on this.  We have 1 primary management server in the head office, 1 reporting server, 1 database server, and 6 more management servers in remote offices.  SCOM 2012 is monitoring less than a hundred servers and 250 to 300 network devices.  Everything is working well in our SCOM 2012 environment just before we do a maintenance and turned off the servers.  After the maintenance, problems appeared on monitoring of our network devices.  Almost half of them went into gray health state but is healthy which means that monitoring is unavailable.  I have tried stopping SCOM services in the management servers, renaming the health service state folder, and starting the services.  Health state of those network devices will return to green but after some time will return again to being gray.  I want to know why this happened and what are we going to do to resolve this issue.  It is greatly affecting our environment. 
August 5th, 2013 9:22pm

Hello Jo,

When seeing 300 network devices and no mention of resource pools i begin to wonder...

If i check the sizer tool voor SCOM and use 500 network devices (has to be 500..) i see that 1 pool with 2 nodes is needed.

I do hope the resource pool is in place. If not.. well.. you get the point :)

/Peter

Free Windows Admin Tool Kit Click here and download it now
August 6th, 2013 4:46am

Hello Peter,

Sorry.  I forgot to mention that we are using the default resource pool "All Management Servers".  Is it okay or not?

August 6th, 2013 5:10am

Hello Jo,

In short: No it's not.

You have to have separate nodes for the network part. The grey agents mean they are not reliable or disconnected.

This can be the result of an overwhelmed server. I guess this is the case in your scenario.

Roll out 2 nodes for the resource pool and add them to the pool. Dedicated to network devices. After this your problems are likely to be gone.

Please mark this post as an answer if it helped you out.

/Peter

Free Windows Admin Tool Kit Click here and download it now
August 6th, 2013 5:14am

Sure.  I will try this in our environment then I will give you a feedback what will be the result.  Thank you for helping. :)
August 6th, 2013 5:21am

No problem Jo!
Please give the nodes enough RAM and cores.

The tool says 32GB with 8 cores per node. I suggest to stick to that :)

/Peter

Free Windows Admin Tool Kit Click here and download it now
August 6th, 2013 5:27am

Peter,

Our management servers are a little bit under the required specification as the sizing helper suggests, but before this problem came everything is okay for about 8 months.  Then last month we started adding more network devices to be monitored, after 2 weeks we did our maintenance and restarted all the management servers, then this problem occurred.  Can you explain why this happened after our maintenance?  I'm just wondering why did this never occurred right after we populated SCOM with network devices to be monitored.  But we will still upgrade our servers' hardware as it is what's required.  Thank you.

Regards,

Jo

August 6th, 2013 10:11pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics