SCOM 2007 R2 not displaying new alerts following certificate problem
Hi, apologies for the long post but want to give some background.
I'm new to SCOM and have a problem I'm struggling to solve.
All was working fine until the certificate expired. I noticed no errors were coming into the SCOM console and on further investigation found that there were a lot of error 20051 in the Operations Manager event log like this:
Event Type: Error
Event Source: OpsMgr Connector
Event Category: None
Event ID: 20051
Description:
The specified certificate could not be loaded because the certificate is not currently valid. Verify that the system time is correct and re-issue the certificate if necessary
Certificate Valid Start Time : 04 September 2008
Certificate Valid End Time : 04 September 2010
I found the article in Tech Net (http://technet.microsoft.com/en-us/library/bb735413.aspx) about obtaining a new certificate from a CA which I followed and imported successfully using MOMCertImport but I am still not getting any alerts through into the SCOM
console.
I now get plenty of 21042 information messages in the Operations Manager event log such as this:
Event Type: Information
Event Source: OpsMgr Connector
Event Category: None
Event ID: 21042
Date: 15/04/2011
Time: 11:17:57
Description:
Operations Manager has discarded 1 items in management group NFRS, which came from $$ROOT$$. These items have been discarded because no valid route exists at this time. This can happen when new devices are added to the topology but the complete
topology has not been distributed yet. The discarded items will be regenerated.
And about every 12 minutes or so I get a 29106 warning message which I don't understand:
Event Type: Warning
Event Source: OpsMgr Config Service
Event Category: None
Event ID: 29106
Date: 15/04/2011
Time: 13:15:01
Description:
The request to synchronize state for OpsMgr Health Service identified by "32fe7aa5-ce6d-0e49-9497-75c42492d7a0" failed due to the following exception "Microsoft.EnterpriseManagement.Common.DataAccessLayerException: Invalid column name SizeNumeric_486ADDDB_2EB8_819A_FA24_8F6AB3E29543
for query MTV_SelectProperty_0b95ad7d-e73b-80cf-56cb-05d537682d2c.
at Microsoft.EnterpriseManagement.Mom.DataAccess.QueryDefinition.GetColumnDefinitionBySourceColumnName(String sourceColumnName, Int32 resultSetIndex)
at Microsoft.EnterpriseManagement.Mom.DataAccess.QueryDefinition.GetColumnDefinitionBySourceColumnName(String sourceColumnName)
at Microsoft.Mom.ConfigService.OpsMgrDataAccess.ConfigurationDataAccessor.QueryInstanceProperties(ReadOnlyCollection`1 instances)
at Microsoft.Mom.ConfigService.DataAccess.DatabaseAccessor.QueryInstanceProperties(ReadOnlyCollection`1 instances)
at Microsoft.Mom.ConfigService.Engine.ConfigurationEngine.CommunicationHelper.StateSyncRequestTask.ConfigurationItems.Instances.CollectPublicProperties(ReadOnlyCollection`1 identities, IConfigurationDataAccessor dataAccessor)
at Microsoft.Mom.ConfigService.Engine.ConfigurationEngine.CommunicationHelper.StateSyncRequestTask.ConfigurationItems.ConfigurationItemCollection`2.CollectPublicProperties(IConfigurationDataAccessor dataAccessor)
at Microsoft.Mom.ConfigService.Engine.ConfigurationEngine.CommunicationHelper.StateSyncRequestTask.ConfigurationItems..ctor(StateContext stateContext, IConfigurationDataAccessor dataAccessor)
at Microsoft.Mom.ConfigService.Engine.ConfigurationEngine.CommunicationHelper.StateSyncRequestTask.CreateResponse(Managers managers)
at Microsoft.Mom.ConfigService.Engine.ConfigurationEngine.Managers.Synchronize(OnDoSynchronizedWork onDoSynchronizedWork)
at Microsoft.Mom.ConfigService.Engine.ConfigurationEngine.CommunicationHelper.StateSyncRequestTask.Execute(Managers managers)
at Microsoft.Mom.ConfigService.Engine.ConfigurationEngine.CommunicationHelper.StateSyncRequestTask.Run(Guid source, String cookie, Managers managers, IConfigurationDataAccessor dataAccessor, Stream stream, IConnection connection)".
Can anyone please shed any light onto what is happening and what I need to do to get this fixed please?
Thanks in advance!
Phil
April 15th, 2011 8:27am
So, you checked the certificate on the client side, did you check the certificate on all the RMS and MS ?Christopher Keyaert - My OpsMgr / SCOM & Opalis blog :
http://www.vnext.be
Free Windows Admin Tool Kit Click here and download it now
April 15th, 2011 9:46am
Hi Christopher, thanks for your response,
We only have one server hosting SCOM in my organisation which I guess is the RMS but we don't have any other management servers involved. We have other servers acting as proxy servers for alerts for Exchange and VMWare but not management servers.
All the certificate information mentioned in my first post is from the server that is the RMS.
Phil
April 15th, 2011 9:52am
Important is to check the certificates that are loaded on both the agent and the RMS like Christopher said.
AFter that you could opt to reset the agent cache (stop the agent service and delete anything in C:\Program Files\System Center Operations Manager 2007\Health Service State\*.* and start the agent again and wait for 5 minutes). Check if you see more of those
events.Bob Cornelissen - BICTT (My BICTT Blog)
Free Windows Admin Tool Kit Click here and download it now
April 18th, 2011 3:26am
Hi,
Please try the following and see if it will work:
1.
Clearing the HealthService queue on the problematic agent:
1)
Stop System Center Management service.
2)
Go to C:\Program Files\System Center Operations Manager 2007\, and rename the “Health Service State” folder.
3)
Restart System Center Management service.
2.
Restart the following services:
System Center Management service
System Center Management Configuration service
System Center Data Access service
3.
Check the required ports:
OpsMgr 2007: Port requirements for SCOM agents in a DMZ
http://blogs.technet.com/b/operationsmgr/archive/2009/02/17/opsmgr-2007-port-requirements-for-scom-agents-in-a-dmz.aspx
If the issue persists, please try obtaining certificates again referring to the following posts:
Step by Step for using Certificates to communicate between agents and the OpsMgr 2007 server
http://blogs.technet.com/b/operationsmgr/archive/2009/09/10/step-by-step-for-using-certificates-to-communicate-between-agents-and-the-opsmgr-2007-server.aspx
Obtaining Certificates for Ops Mgr via Command Line or Script
http://blogs.technet.com/b/momteam/archive/2008/06/02/obtaining-certificates-for-ops-mgr.aspx
Hope this helps.
Thanks.
Nicholas Li - MSFT
Please remember to click Mark as Answer on the post that helps you, and to click Unmark as Answer if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
April 18th, 2011 4:39am
Thanks to everyone, I have found the answer now!
It was actually a problem with some MPs after the certificate importing.
This Microsoft link showed me the issue http://support.microsoft.com/kb/2017680
I then looked at the versions of my Windows Server mps and found that the Server 2000 and Server 2003 ones were out of date.
Downloaded the new one from http://www.microsoft.com/downloads/en/details.aspx?FamilyID=3529d233-5e3e-4b51-8f66-5d6f27005ec3
Imported the two MPs and the alerts have started to flood through!
Thanks once again for all your assistance.
Phil
Free Windows Admin Tool Kit Click here and download it now
April 18th, 2011 6:14am