SCOM 2007 R2 check list for high availability

Hi all,

I am new to SCOM and I want to create a check list for maintaining stability and to make sure there is issues.

Can anyone please help me to create a check list, that I can double check it everyday to make sure that everything is under control.?

Any suggestions are welcome!! :)

September 9th, 2013 9:11am

It depends on the environment but I tend to create a checklist view and mix & match some of the following:
- override all script errors to be informational. Create a view for script errors. Then check once per day on the repeat count. Anything less than 5 (in general) resolve and ignore.

- copy the disk state view and check each morning (especially for warning which don't generate alerts)

- copy health service state - again, you should get an alert for this but it sometimes helps to have this in one place with other "quick check" views.

- I always tend to check the operationsmanager event logs on the Management servers for warnings and criticals errors. It is an early indication of possible problems.

To some extent it depends on how you like to work and how aggressive you are with tuning down the alerts. Also, whether you prefer state changes (state views) to alerts. I'm a great fan of making critical alerts just for really critical incidents rather than the way it works out of  the box where (imo) far too much is critical. Also, turning off alerting and leveraging state views is useful for those items that you need to know about but not immediately. But it takes time to all of this together.


http://thoughtsonopsmgr.blogspot.hk/2009/11/opsmgr-where-technology-ends-and.html

http://social.technet.microsoft.com/Forums/systemcenter/en-US/d852b888-aadd-4cf2-b8be-823c69c773ca/daily-checklist-template-for-scom

Roger

Free Windows Admin Tool Kit Click here and download it now
September 9th, 2013 10:42pm

It depends on the environment but I tend to create a checklist view and mix & match some of the following:
- override all script errors to be informational. Create a view for script errors. Then check once per day on the repeat count. Anything less than 5 (in general) resolve and ignore.

- copy the disk state view and check each morning (especially for warning which don't generate alerts)

- copy health service state - again, you should get an alert for this but it sometimes helps to have this in one place with other "quick check" views.

- I always tend to check the operationsmanager event logs on the Management servers for warnings and criticals errors. It is an early indication of possible problems.

To some extent it depends on how you like to work and how aggressive you are with tuning down the alerts. Also, whether you prefer state changes (state views) to alerts. I'm a great fan of making critical alerts just for really critical incidents rather than the way it works out of  the box where (imo) far too much is critical. Also, turning off alerting and leveraging state views is useful for those items that you need to know about but not immediately. But it takes time to all of this together.


http://thoughtsonopsmgr.blogspot.hk/2009/11/opsmgr-where-technology-ends-and.html

http://social.technet.microsoft.com/Forums/systemcenter/en-US/d852b888-aadd-4cf2-b8be-823c69c773ca/daily-checklist-template-for-scom

Roger

Thanks for your help Roger.
September 10th, 2013 6:33am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics