Deleting Management Pack causes all workflows to stall and Console connections to fail
This just started yesterday. I was attempting to remove a Management Pack that was used to monitor the BITS service and the process caused all workflows to stall and all Console Connections to fail. To get SCOM back up and running I had to stop the SDK Service and restart my SQL Server. I tried a few more times with the same result, the workflows would stop and all console connections would fail. I have narrowed it down to the Windows Service Management Pack Template I setup for monitoring the BITS service. The same thing happens when I try to delete this Windows Service monitor. It causes all workflows to stop and all console connections start failing. Has anyone ever experience the same thing? What could deleting a simple service monitor be doing that would cause this?
December 8th, 2010 10:52am

Hi, Please ensure the following services are running and try restarting them: System Center Management service System Center Management Configuration service System Center Data Access service Please also try running the following command on the server to start the console and see how it works: %Program files%\System Center Operations Manager 2007\Microsoft.mom.ui.console.exe /ClearCache Meanwhile, please check the Event Log and let us know the details if there is any related errors. Hope this helps. Thanks.Nicholas Li - MSFT Please remember to click Mark as Answer on the post that helps you, and to click Unmark as Answer if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
Free Windows Admin Tool Kit Click here and download it now
December 10th, 2010 1:06am

I get events in the OperationsManager eventlog like these (from a few minutes after I start the delete process, until the services have been restarted, SQL Restarted and the recovery of the OperationsManager DB has been completed): Event Type: Warning Event Source: HealthService Event Category: None Event ID: 2115 Date: 12/9/2010 Time: 1:37:26 AM User: N/A Computer: <RMS Server Name> Description: A Bind Data Source in Management Group <MGMT GROUP NAME> has posted items to the workflow, but has not received a response in 1042 seconds. This indicates a performance or functional problem with the workflow. Workflow Id : Microsoft.SystemCenter.DataWarehouse.CollectPerformanceData Instance : <FQDN of RMS> Instance Id : {2A38A9C8-36F6-D6B1-8731-E2ED910A6BDF} Event Type: Warning Event Source: HealthService Event Category: None Event ID: 2115 Date: 12/9/2010 Time: 1:37:30 AM User: N/A Computer: <RMS Server Name> Description: A Bind Data Source in Management Group <MGMT GROUP NAME> has posted items to the workflow, but has not received a response in 981 seconds. This indicates a performance or functional problem with the workflow. Workflow Id : Microsoft.SystemCenter.CollectSignatureData Instance : <FQDN of RMS> Instance Id : {2A38A9C8-36F6-D6B1-8731-E2ED910A6BDF} Event Type: Warning Event Source: HealthService Event Category: Data Publisher Manager Event ID: 8000 Date: 12/9/2010 Time: 1:37:02 AM User: N/A Computer: <RMS SERVER NAME> Description: A subscriber data source in management group <MGMT GROUP NAME> has posted items to the workflow, but has not received a response in 18 minutes. Data will be queued to disk until a response has been received. This indicates a performance or functional problem with the workflow. Workflow Id : Microsoft.SystemCenter.DataWarehouse.CollectEventData Instance : <FQDN of RMS> Instance Id : {2A38A9C8-36F6-D6B1-8731-E2ED910A6BDF} Event Type: Warning Event Source: HealthService Event Category: Data Publisher Manager Event ID: 8000 Date: 12/9/2010 Time: 1:37:04 AM User: N/A Computer: <RMS SERVER NAME> Description: A subscriber data source in management group <MGMT GROUP NAME> has posted items to the workflow, but has not received a response in 18 minutes. Data will be queued to disk until a response has been received. This indicates a performance or functional problem with the workflow. Workflow Id : Microsoft.SystemCenter.DataWarehouse.CollectPerformanceData Instance : <FQDN of RMS> Instance Id : {2A38A9C8-36F6-D6B1-8731-E2ED910A6BDF} If I just stop and then restart the SDK Service, it will show the service running, but the following error comes up if you try to open the console "The SDK Service has not yet initialized. Please retry" and the following event shows in the OperationsManager eventlog on the RMS: Event Type: Error Event Source: OpsMgr SDK Service Event Category: None Event ID: 26319 Date: 12/9/2010 Time: 1:47:05 AM User: N/A Computer: <RMS SERVER NAME> Description: An exception was thrown while processing Connect for session id uuid:70128c61-2c5d-42ea-9f06-22c9d58eb189;id=2. Exception Message: The creator of this fault did not specify a Reason. Full Exception: System.ServiceModel.FaultException`1[Microsoft.EnterpriseManagement.Common.SdkServiceNotInitializedException]: The creator of this fault did not specify a Reason. (Fault Detail is equal to Microsoft.EnterpriseManagement.Common.SdkServiceNotInitializedException: Sdk Service has not yet initialized. Please retry).
December 10th, 2010 2:08pm

Hi Richard, I see this every time in our work environment whenever we need to make a global change. For us this is caused by having a large environment, and the RMS taking a long time to process all agents configuration files and start the deployment to each server. Eventually the console does come back (can take 30-60 minutes), and all workflows will automatically start posting their data with the back log from when the environment seemed to hang. Unfortunately it's not the best situation to have, but after numerous cases with Microsoft to try and address this issue (including 2 OpsMgr Health Checks with a PFE), we still haven't resolved the issue. I think this is just a scalability issue with the current OpsMgr architecture, which I'm hoping OpsMgr vNext will fix with it's distributed architecture, as the bottle neck appears to be the OpsMgr RMS role. In your situation, if you have a relatively large environment (1500+ agents), then I would suspect you might be having the same issue. It might also be an indicator that your environment is experiencing performance issues, which could be due to hardware configuration, software configuration, slow disks, too many instances in your environment, etc. - If you have a relatively large environment, then if you let the update process to continue for an hour or so it should eventually finish. This can be checked by monitoring the configuration push from the OpsMgr RMS server, by monitoring for event 21903 (from memory) in the OpsMgr event log. I would also recommend raising a case with Microsoft Support for their assistance, as they may be able to assist you with the performance/config churn issue in your environment. This type of issue would be too detailed to discuss and diagnose via the forums. - If you have a relatively small environment, then I suspect you have some serious performance/configuration problems in your environment. I would first recommend reading the OpsMgr Install Guide, and the guide for hardware requirements for the size of your environment (OpsMgr Scalability Guide?). If you don't see any obvious points that have been missed, then I'd recommend raising a case with Microsoft Support so they can help pinpoint the issue. The reason the OpsMgr Console and workflows stop in this type of situation is usually due to the Config service being overloaded with processing the configuration files for agents. When this occurs, the distribution of agent configuration files is top of the list, so everything else is essentially halted to allow this process to complete. Whilst this is occurring, monitoring continues at the agent & MS levels, and alerts will continue to be raised into the OpsMgr DB. But no-one will be able to see them via the OpsMgr Console or OpsMgr Web Console, as the SDK service isn't able to retrieve this information as the Config service is blocking it. I hope this information helps. Good luck with this issue, I hope you're able to resolve it without having to wait for OpsMgr vNext. Cheers, Brian
Free Windows Admin Tool Kit Click here and download it now
December 12th, 2010 6:34pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics