SCOM 2012 Management Server Recovery Issue - Event ID 21023, 20070 and 21016

Hello there - I really hope you guys can help with this one. It's seems similar to many resolved posts out there, but nothing has worked for me.

Following a total crash of my SCOM 2012 managment server, I re-installed on a new server with same name and IP etc. and used the following to re-install/recover:

Setup.exe /silent /AcceptEndUserLicenseAgreement
/recover
/EnableErrorReporting:[Never|Queued|Always]
/SendCEIPReports:[0|1]
/UseMicrosoftUpdate:[0|1]
/DatabaseName:<OperationalDatabaseName>
/SqlServerInstance:<server\instance>
/DWDatabaseName:<DWDatabaseName>
/DWSqlServerInstance:<server\instance>
/UseLocalSystemDASAccount
/DatareaderUser:<domain\username>
/DatareaderPassword:<password>
/DataWriterUser:<domain\username>
/DataWriterPassword:<password>
/ActionAccountUser:<domain\username>
/ActionAccountPassword:<password>

from http://technet.microsoft.com/en-us/library/hh531578.aspx

Console is now accessible and all seems fine, except no agents can communicate with the Management Server (including itself it seems - health is grey in the console).

Servers are all on the same domain (except 2 which did use certs, but not even looked at them yet!)

The agent servers log events:

21023 (OpsMgr has no configuration for management group XXX and is requesting new configuration from the Configuration Service.)

20070 (The OpsMgr Connector connected to xxx.yyy.com, but the connection was closed immediately after authentication occurred.)

and 21016 (OpsMgr was unable to set up a communications channel to xxx.yyy.com and there are no failover hosts.  Communication will resume when xxx.yyy.com is available and communication from this computer is allowed.)

No 20000 errors on the management server.

Management server logs:

21023 (OpsMgr has no configuration for management group XXX and is requesting new configuration from the Configuration Service.)

and

29120 (not sure this on is related):

OpsMgr Management Configuration Service failed to process configuration request (Xml configuration file or management pack request) due to the following exception

Microsoft.EnterpriseManagement.ManagementConfiguration.Interop.HealthServicePublicKeyNotRegisteredException: Padding is invalid and cannot be removed.

Server stack trace:

   at Microsoft.EnterpriseManagement.RuntimeService.RootConnectorMethods.OnRetrieveSecureData(Guid healthServiceId, ReadOnlyCollection`1 addedSecureStorageReferences, ReadOnlyCollection`1 removedSecureStorageReferences, ReadOnlyCollection`1 addedSecureStorageElements, ReadOnlyCollection`1 removedSecureStorageElements, String hashAlgorithmName, Byte[]& hashValue)

   at Microsoft.EnterpriseManagement.RuntimeService.SDKReceiver.OnRetrieveSecureData(Guid healthServiceId, ReadOnlyCollection`1 addedSecureStorageReferences, ReadOnlyCollection`1 removedSecureStorageReferences, ReadOnlyCollection`1 addedSecureStorageElements, ReadOnlyCollection`1 removedSecureStorageElements, String hashAlgorithmName, Byte[]& hashValue)

   at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Int32 methodPtr, Boolean fExecuteInContext, Object[]& outArgs)

   at System.Runtime.Remoting.Messaging.StackBuilderSink.SyncProcessMessage(IMessage msg, Int32 methodPtr, Boolean fExecuteInContext)

Exception rethrown at [0]:

   at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)

   at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)

   at Microsoft.EnterpriseManagement.Mom.Internal.ISdkService.OnRetrieveSecureData(Guid healthServiceId, ReadOnlyCollection`1 addedSecureStorageReferences, ReadOnlyCollection`1 removedSecureStorageReferences, ReadOnlyCollection`1 addedSecureStorageElements, ReadOnlyCollection`1 removedSecureStorageElements, String hashAlgorithmName, Byte[]& hashValue)

   at Microsoft.EnterpriseManagement.ManagementConfiguration.Communication.CredentialDataProvider.GetSecureDataUnwrapped(Guid agentId, ICollection`1 addedReferenceList, ICollection`1 deletedReferenceList, ICollection`1 addedCredentialList, ICollection`1 deletedCredentialList, Byte[]& hashValue)

   at Microsoft.EnterpriseManagement.ManagementConfiguration.Communication.CredentialDataProvider.GetSecureData(Guid agentId, ICollection`1 addedReferenceList, ICollection`1 deletedReferenceList, ICollection`1 addedCredentialList, ICollection`1 deletedCredentialList, Byte[]& hashValue)

   at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.TracingCredentialDataProvider.GetSecureData(Guid agentId, ICollection`1 addedReferenceList, ICollection`1 deletedReferenceList, ICollection`1 addedCredentialList, ICollection`1 deletedCredentialList, Byte[]& hashValue)

   at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.AgentConfigurationFormatter.WriteSecureData(AgentConfigurationStream stream, XmlWriter writer, Guid agentId, Hashtable credentialAssociationList, Hashtable credentialList)

   at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.AgentConfigurationFormatter.WriteSnapshotState(AgentConfigurationStream stream, XmlWriter writer, AgentValidatedConfiguration validatedConfig)

   at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.AgentConfigurationFormatter.GetSnapshotConfigurationStream(AgentValidatedConfiguration validatedConfig, AgentConfigurationCookie oldCookie, AgentConfigurationCookie& newCookie)

   at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.AgentConfigurationBuilder.FormatConfig(ConfigurationRequestDescriptor requestDescriptor, IAgentConfiguration agentConfig)

   at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.AgentRequestProcessor.ProcessConfigurationRequest(ICollection`1 requestList, Int32& processedRequestsCount)

   at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.AgentRequestProcessor.Execute()

   at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.ThreadManager.ResponseThreadStart(Object state)

I can push out a client to a server and it returns a completed successfully response (and indeed it does install), but it sits in "Installation in Progress" forever.

I have checked everything I can find on the forums and now have no idea where to go now apart from rebuilt a new Management group from scratch (which would be a real pain after nearly 1 year of customisations).

I have checked:

- SPNs (all correct and no kerberos events in event log)

- DNS (all resolves and pings correctly)

- Virtual hosts moved to another server (rule out mac address weirdness)

- Cleared the Healthservice queue on both Management Server and Agent servers (rename the Health Service State folder etc)

- Installed a 2nd new Managment server and tried to switch an agent to this one, but same issue.

- In desperation I even setup a totally new installation from the DVD (no /repair used) using the same managment group and server names and then replaced the SQL DBs with a backup of the old ones - but that didn't work at all, even the console didn't connect.

Anything else I should check?

Thanks in advance...

Ben



April 21st, 2013 10:07pm

Hi -

On any one of the agent servers that are in the same domain as of the MGMT server, look at the registry - HKLM - current control - services - health service. Check if the management server name is same as your management server and the management group name. Confirm the same on your management server. You can also deploy a new agent and check if it can communicate back.

On the management server check if the DAS service is running and the service account has login permissions to the SQL server - look for errors in the sql logs too. Try flushing the management server configuration folder and restarting the service.

Also confirm if the pending approval is set to allow and not reject, since this was a reinstall the pending approval might have changed to reject.

-A

Free Windows Admin Tool Kit Click here and download it now
April 22nd, 2013 3:26am

Are you To Reconfigure the RunAs Accounts

Roger

April 22nd, 2013 4:21am

Yes, I have re-entered the run as credentials (username, password and domain) for each runas account from the console. But since any changes in the console don't make it to the health service on the ms and other agents I'm not sure how this works... Thanks
Free Windows Admin Tool Kit Click here and download it now
April 22nd, 2013 6:41am

Registry checked - all correct already. Das service correct (domain account) and is the dbowner on the SQL DB. Tried renaming config state as well as health state and restarting DAS service - again nothing! Pending approvals are automatically approved. This is a tough one! Thanks
April 22nd, 2013 6:44am

Try to delete the computer account and approve it again

Roger

Free Windows Admin Tool Kit Click here and download it now
April 22nd, 2013 7:22am

Delete the management server computer account in AD? I have tried that and also removed and rejoined the domain effectively doing the same thing. :-( After speaking with one of my techs it looks like the original server did not crash, but instead the management server app was uninstalled accidentally (mistaken for the web console server and once in motion cancel was not allowed! - yeah, I know!!!) - leaving no management server (it was the only one), just the agent servers and the SQL database. Reinstall using above method was tried on the same server and a new build (same name etc) When the management server is uninstalled, does it do something to the database which affectively removes the management server? The ms is still is still visible in the management server folder on the console...
April 22nd, 2013 7:29am

I have just restored the SQL DB from the backup from the night before the management server was uninstalled and still the exact same issue exists, so it can't be that the database gets changed during the uninstall... I think I may have to rebuild a new Management Server from scratch and import the Management Packs from the broken Management Group.... ;-(
Free Windows Admin Tool Kit Click here and download it now
April 22nd, 2013 11:16am

Could be the following issue with TLS:
https://geertbaeten.wordpress.com/2013/07/08/scom-agent-or-gateway-certificate-issue/

Best regards,

Geert

July 8th, 2013 8:01am

Hi, why to use switch /UseLocalSystemDASAccount if you have dedicated domain account for it?

do DAS account and Management server action account have local admin rights?

also check http://kobile.wordpress.com/2009/12/17/opsmgr-2007-r2-events-id-20070-21016-21023-after-upgrade/

Free Windows Admin Tool Kit Click here and download it now
July 8th, 2013 8:39am

Try this

1. Uninstall SCOM agent

2. Open Operations Console, delete the entity from Administration --> device management --> Agent Managed

3. Make sure that security setting of SCOM is manually approve installed agent

4. Reinstall SCOM agent again

5. Approve the SCM agent

Roger

July 8th, 2013 8:00pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics