Deleting Management Pack causes all workflows to stall and Console connections to fail
This just started yesterday. I was attempting to remove a Management Pack that was used to monitor the BITS service and the process caused all workflows to stall and all Console Connections to fail. To get SCOM back up and running I had to stop
the SDK Service and restart my SQL Server.
I tried a few more times with the same result, the workflows would stop and all console connections would fail.
I have narrowed it down to the Windows Service Management Pack Template I setup for monitoring the BITS service. The same thing happens when I try to delete this Windows Service monitor. It causes all workflows to stop and all console connections
start failing.
Has anyone ever experience the same thing? What could deleting a simple service monitor be doing that would cause this?
December 8th, 2010 10:52am
Hi,
Please ensure the following services are running and try restarting them:
System Center Management service
System Center Management Configuration service
System Center Data Access service
Please also try running the following command on the server to start the console and see how it works:
%Program files%\System Center Operations Manager 2007\Microsoft.mom.ui.console.exe /ClearCache
Meanwhile, please check the Event Log and let us know the details if there is any related errors.
Hope this helps.
Thanks.Nicholas Li - MSFT
Please remember to click Mark as Answer on the post that helps you, and to click Unmark as Answer if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
Free Windows Admin Tool Kit Click here and download it now
December 10th, 2010 1:06am
I get events in the OperationsManager eventlog like these (from a few minutes after I start the delete process, until the services have been restarted, SQL Restarted and the recovery of the OperationsManager DB has been completed):
Event Type: Warning
Event Source: HealthService
Event Category: None
Event ID: 2115
Date: 12/9/2010
Time: 1:37:26 AM
User: N/A
Computer: <RMS Server Name>
Description:
A Bind Data Source in Management Group <MGMT GROUP NAME> has posted items to the workflow, but has not received a response in 1042 seconds. This indicates a performance or functional problem with the workflow.
Workflow Id : Microsoft.SystemCenter.DataWarehouse.CollectPerformanceData
Instance : <FQDN of RMS>
Instance Id : {2A38A9C8-36F6-D6B1-8731-E2ED910A6BDF}
Event Type: Warning
Event Source: HealthService
Event Category: None
Event ID: 2115
Date: 12/9/2010
Time: 1:37:30 AM
User: N/A
Computer: <RMS Server Name>
Description:
A Bind Data Source in Management Group <MGMT GROUP NAME> has posted items to the workflow, but has not received a response in 981 seconds. This indicates a performance or functional problem with the workflow.
Workflow Id : Microsoft.SystemCenter.CollectSignatureData
Instance : <FQDN of RMS>
Instance Id : {2A38A9C8-36F6-D6B1-8731-E2ED910A6BDF}
Event Type: Warning
Event Source: HealthService
Event Category: Data Publisher Manager
Event ID: 8000
Date: 12/9/2010
Time: 1:37:02 AM
User: N/A
Computer: <RMS SERVER NAME>
Description:
A subscriber data source in management group <MGMT GROUP NAME> has posted items to the workflow, but has not received a response in 18 minutes. Data will be queued to disk until a response has been received. This indicates a performance or
functional problem with the workflow.
Workflow Id : Microsoft.SystemCenter.DataWarehouse.CollectEventData
Instance : <FQDN of RMS>
Instance Id : {2A38A9C8-36F6-D6B1-8731-E2ED910A6BDF}
Event Type: Warning
Event Source: HealthService
Event Category: Data Publisher Manager
Event ID: 8000
Date: 12/9/2010
Time: 1:37:04 AM
User: N/A
Computer: <RMS SERVER NAME>
Description:
A subscriber data source in management group <MGMT GROUP NAME> has posted items to the workflow, but has not received a response in 18 minutes. Data will be queued to disk until a response has been received. This indicates a performance or
functional problem with the workflow.
Workflow Id : Microsoft.SystemCenter.DataWarehouse.CollectPerformanceData
Instance : <FQDN of RMS>
Instance Id : {2A38A9C8-36F6-D6B1-8731-E2ED910A6BDF}
If I just stop and then restart the SDK Service, it will show the service running, but the following error comes up if you try to open the console "The SDK Service has not yet initialized. Please retry" and the following event shows in the
OperationsManager eventlog on the RMS:
Event Type: Error
Event Source: OpsMgr SDK Service
Event Category: None
Event ID: 26319
Date: 12/9/2010
Time: 1:47:05 AM
User: N/A
Computer: <RMS SERVER NAME>
Description:
An exception was thrown while processing Connect for session id uuid:70128c61-2c5d-42ea-9f06-22c9d58eb189;id=2.
Exception Message: The creator of this fault did not specify a Reason.
Full Exception: System.ServiceModel.FaultException`1[Microsoft.EnterpriseManagement.Common.SdkServiceNotInitializedException]: The creator of this fault did not specify a Reason. (Fault Detail is equal to Microsoft.EnterpriseManagement.Common.SdkServiceNotInitializedException:
Sdk Service has not yet initialized. Please retry).
December 10th, 2010 2:08pm
Hi Richard,
I see this every time in our work environment whenever we need to make a global change. For us this is caused by having a large environment, and the RMS taking a long time to process all agents configuration files and start the deployment to each server.
Eventually the console does come back (can take 30-60 minutes), and all workflows will automatically start posting their data with the back log from when the environment seemed to hang.
Unfortunately it's not the best situation to have, but after numerous cases with Microsoft to try and address this issue (including 2 OpsMgr Health Checks with a PFE), we still haven't resolved the issue. I think this is just a scalability issue with the
current OpsMgr architecture, which I'm hoping OpsMgr vNext will fix with it's distributed architecture, as the bottle neck appears to be the OpsMgr RMS role.
In your situation, if you have a relatively large environment (1500+ agents), then I would suspect you might be having the same issue. It might also be an indicator that your environment is experiencing performance issues, which could be due to hardware
configuration, software configuration, slow disks, too many instances in your environment, etc.
- If you have a relatively large environment, then if you let the update process to continue for an hour or so it should eventually finish. This can be checked by monitoring the configuration push from the OpsMgr RMS server, by monitoring for event
21903 (from memory) in the OpsMgr event log. I would also recommend raising a case with Microsoft Support for their assistance, as they may be able to assist you with the performance/config churn issue in your environment. This type of issue would be too detailed
to discuss and diagnose via the forums.
- If you have a relatively small environment, then I suspect you have some serious performance/configuration problems in your environment. I would first recommend reading the OpsMgr Install Guide, and the guide for hardware requirements for the size
of your environment (OpsMgr Scalability Guide?). If you don't see any obvious points that have been missed, then I'd recommend raising a case with Microsoft Support so they can help pinpoint the issue.
The reason the OpsMgr Console and workflows stop in this type of situation is usually due to the Config service being overloaded with processing the configuration files for agents. When this occurs, the distribution of agent configuration files is top of
the list, so everything else is essentially halted to allow this process to complete. Whilst this is occurring, monitoring continues at the agent & MS levels, and alerts will continue to be raised into the OpsMgr DB. But no-one will be able to see them
via the OpsMgr Console or OpsMgr Web Console, as the SDK service isn't able to retrieve this information as the Config service is blocking it.
I hope this information helps. Good luck with this issue, I hope you're able to resolve it without having to wait for OpsMgr vNext.
Cheers,
Brian
Free Windows Admin Tool Kit Click here and download it now
December 12th, 2010 6:34pm