Can't completely stop System Center Management service on RMS
Hi All, I believe my issue is similar to the one found here. I'm using SCOM 2007 R2. Less than 700 agents, most current MP's from Microsoft. The send queue on the RMS occasionally starts climbing until a service restart happens. Today it started to climb after I created a new management pack, though nothing applies to it right now after I cancelled the New Group Wizard. It does this whenever any type of override is made, or a new agent is installed, etc. A lot of times, when attempting the service stop/rename health service state folder, I get the error that "You need permission to perform this action" after trying to rename it. When I am successfully able to stop the service and rename the folder, the HealthService.exe process still remains in memory. Each "successful" service restart afterward will create another HealthService.exe in memory (not taking up any processor though). When I try to kill this process I get either "Access is Denied" or no error at all, but it never goes away until a reboot. It seems when a database cluster failover occurs (with the SCOM db's on them), the problem is less present on one node than the other. There is no difference between either node on the network, software versions, hardware configs, etc. Right now it seems rebooting is the only way to fix the RMS problem. Any suggestions on what might be causing this? Thank in advance.
April 30th, 2012 5:31pm

Hi there, this maybe a long shot, but I've found a lot of these type of issues can be as a result of an incorrect SQL collation configuration on the SQL instance that is hosting your SCOM databases. The correct SQL collation setting for SCOM should be specified as: SQL_Latin1_General_CP1_CI_AS If there is any other collation setting specified, then you will need to create a new SQL instance and migrate your SCOM databases over to it. See the link below for confirmation of this: http://support.microsoft.com/kb/958979 Other than that and if the SQL collation is correct, then I would check things like your Anti-Virus software to see if it has all of the relevant exclusions specified for SCOM - see these links: http://blogs.msdn.com/b/nickmac/archive/2008/07/18/antivirus-exclusions-for-operations-manager-2007.aspx http://blogs.technet.com/b/kevinholman/archive/2007/12/12/antivirus-exclusions-for-mom-and-opsmgr.aspx You could also try using ProcMon to troubleshoot the scripts and processes that are running to see if you can locate the problem: http://blogs.technet.com/b/smsandmom/archive/2008/12/10/opsmgr-2007-how-to-identify-what-scripts-are-running-on-the-agents-including-frequency-and-parameters.aspx Hope this helps! Kevin.
Free Windows Admin Tool Kit Click here and download it now
May 1st, 2012 8:47am

Hi there, this maybe a long shot, but I've found a lot of these type of issues can be as a result of an incorrect SQL collation configuration on the SQL instance that is hosting your SCOM databases. The correct SQL collation setting for SCOM should be specified as: SQL_Latin1_General_CP1_CI_AS If there is any other collation setting specified, then you will need to create a new SQL instance and migrate your SCOM databases over to it. See the link below for confirmation of this: http://support.microsoft.com/kb/958979 Other than that and if the SQL collation is correct, then I would check things like your Anti-Virus software to see if it has all of the relevant exclusions specified for SCOM - see these links: http://blogs.msdn.com/b/nickmac/archive/2008/07/18/antivirus-exclusions-for-operations-manager-2007.aspx http://blogs.technet.com/b/kevinholman/archive/2007/12/12/antivirus-exclusions-for-mom-and-opsmgr.aspx You could also try using ProcMon to troubleshoot the scripts and processes that are running to see if you can locate the problem: http://blogs.technet.com/b/smsandmom/archive/2008/12/10/opsmgr-2007-how-to-identify-what-scripts-are-running-on-the-agents-including-frequency-and-parameters.aspx Hope this helps! Kevin.
May 1st, 2012 8:47am

Thanks for the information good sir! I'll check on both of these today and post my findings. Our database and systems security teams should have things set correctly. I was just recently "allowed" in to the database cluster that SCOM dbs resides on to find another 50 databases sitting on there with it, most of them production... I couldn't really see much info on whether SCOM db's should be on their own cluster, but network traffic seems to be in a decent range still.
Free Windows Admin Tool Kit Click here and download it now
May 1st, 2012 10:54am

No problem but as a rule, I ALWAYS install the SCOM DB's into their own SQL instance and never co-locate them with anything else. This is particularly true when you are using the SCOM reporting module as it has it's own security model that will wipe the security and permissions of anything else that tries to use it. There's no problem having the DB's on the same SQL server or SQL cluster, just make sure to run them in their own SQL instances. The same should apply for all the other System Center products in the suite - run them all in their own instances where possible. I'd nearly bet that the SQL collation for the instance the DB's are currently sitting on is incorrect and that's probably why you are getting the weird issues described. Kevin.
May 1st, 2012 11:43am

can you also check the "handle count" of the healthservice process on the rms? any chance this is going skyhigh (like 100.000+)?Rob Korving http://jama00.wordpress.com/
Free Windows Admin Tool Kit Click here and download it now
May 2nd, 2012 2:21am

Well, it turns out that the SQL collation is correct for the databases. I'm still waiting to hear from our security group about antivirus exclusions...though they might need 2 or 3 more reminders about my request. As far as the handle count goes, I'll have to see when the issue arises again. After a reboot of the RMS yesterday evening and no changes made yet today, SCOM is running as it should. Right now the handle count is hovering around 2.4k. I'll be able to get a better idea when a config change gets pushed.
May 2nd, 2012 12:58pm

The reason i ask is bc we had that on management servers (handlecount up to 2.2 million even :)), after which the healthservice couldn't be restarted investigation of the handles showed to be "open sockets" that were never really initiated ok due to incorrect routing of the agent traffic (the healthservice "stops", but still claims the port 5723, so a new instance can't start; you should see alerts about this in the eventlog though).Rob Korving http://jama00.wordpress.com/
Free Windows Admin Tool Kit Click here and download it now
May 4th, 2012 2:56am

The reason i ask is bc we had that on management servers (handlecount up to 2.2 million even :)), after which the healthservice couldn't be restarted investigation of the handles showed to be "open sockets" that were never really initiated ok due to incorrect routing of the agent traffic (the healthservice "stops", but still claims the port 5723, so a new instance can't start; you should see alerts about this in the eventlog though).Rob Korving http://jama00.wordpress.com/
May 4th, 2012 2:56am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics