New agents not working Events 20000, 21016, 20070, etc.
I am currently running an evaluation of OpsMgr R2. At first, everything went pretty smoothly - I added about 9 machines (mixed workstations and servers) and have been monitoring them ever since. I decided to try to monitor a DMZ machine, so I set up a CA on an W2K3 Enterprise R2 x64 machine on the domain and created the OpsMgr template per the security guide. I was using this gude: http://blogs.technet.com/momteam/archive/2008/08/22/obtaining-certificates-for-non-domain-joined-agents-made-easy.aspx Everything seemed to go smoothly with the certificate and agent deployment. I configured SCOM to require manual approval for manually installed agents. The DMZ machine agent appeared under "Pending", I approved it and it went to "Agent Managed", but it never said "Healthy". It is sitting there saying "Not Monitored". I thought maybe it was a problem with the certificate, so I went ahead and tried to deploy to additional domain machines. Now I can't deploy agents them either. I get a recurring EventID 20000 - "A device which is not part of this management group has attempted to access this Health Service. Requesting Device Name: <agentmachine>" These computers are domain machines and ports 5723, 445, 135 have all been verified to be open. It appears that the agent installs properly but can't communicate. On all the client machines (both domain and non-domain), I get EventIDs 20070, 21016 and 21023 as follows: Event Type: Error Event Source: OpsMgr Connector Event Category: None Event ID: 20070 Date: 6/1/2009 Time: 1:32:01 PM User: N/A Computer: <agentmachine> Description: The OpsMgr Connector connected to mabwscom1.medassurant.local, but the connection was closed immediately after authentication occurred. The most likely cause of this error is that the agent is not authorized to communicate with the server, or the server has not received configuration. Check the event log on the server for the presence of 20000 events, indicating that agents which are not approved are attempting to connect. For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp. ------------------------------- Event Type: Error Event Source: OpsMgr Connector Event Category: None Event ID: 21016 Date: 6/1/2009 Time: 1:32:04 PM User: N/A Computer: <agentmachine> Description: OpsMgr was unable to set up a communications channel to scomserver.medassurant.local and there are no failover hosts. Communication will resume when scomserver.domain.local is available and communication from this computer is allowed. For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp. ------------------------------- Event Type: Information Event Source: OpsMgr Connector Event Category: None Event ID: 21023 Date: 6/1/2009 Time: 1:37:02 PM User: N/A Computer: <agentmachine> Description: OpsMgr has no configuration for management group SCOM and is requesting new configuration from the Configuration Service. For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp. Currently, there are 6 machines saying "Not Monitored" and on the SCOM server, I'm getting 6 sequential 21042 errors, all identical, with this text: Event Type: Information Event Source: OpsMgr Connector Event Category: None Event ID: 21042 Date: 6/1/2009 Time: 12:17:20 PM User: N/A Computer: <scomserver> Description: Operations Manager has discarded 1 items in management group SCOM, which came from $$ROOT$$. These items have been discarded because no valid route exists at this time. This can happen when new devices are added to the topology but the complete topology has not been distributed yet. The discarded items will be regenerated. For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp. Along with a single 21024 error with this text: Event Type: Information Event Source: OpsMgr Connector Event Category: None Event ID: 21024 Date: 6/1/2009 Time: 12:17:20 PM User: N/A Computer: <scomserver> Description: OpsMgr's configuration may be out-of-date for management group SCOM, and has requested updated configuration from the Configuration Service. The current(out-of-date) state cookie is "94 9D DE 40 23 0A 45 F0 3F 2F 12 67 73 4D 0D 73 F7 89 57 65 " For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp. Thanks in advance for any assistance.
June 1st, 2009 9:06pm

I think I saw something like this last week on a machine. We deleted it from the console, run the cleanup tool from Ops Mgr resource kit, then reinstalled, and it worked.Anders Bengtsson | Microsoft MVP - Operations Manager | http://www.contoso.se
Free Windows Admin Tool Kit Click here and download it now
June 1st, 2009 9:23pm

Hello. I've been battling with certs ever since and now I think I know the proper route to do: 1. Create certificates based on template for server and managed client 2. Import them to proper cert storage 3. Run MomCertImport tool on both 4. Make sure you import to Trusted Root Certificates your CA Root cert 5. Restart agent 6. .... Should work :)Ingrifo - We Do SCOM
June 1st, 2009 9:24pm

number 4 should be number 1, so that you first fix the root CA certificate. Then I belive you need to install the agent before you run momcertimport.exe. Then restart the healthservice on both the management server and the agent. I did a blog post about gateway servers yesterday. If you remove the part with momgateway approval and replace the momgateway software with the momagent software the guide should work for this scenario too, http://contoso.se/blog/?p=680Anders Bengtsson | Microsoft MVP - Operations Manager | http://www.contoso.se
Free Windows Admin Tool Kit Click here and download it now
June 1st, 2009 9:28pm

The only thing is, the certs issue should not have any effect on my ability to deploy to domain machines, correct? I didn't need to install certs on them before. As for the full cleanup & redeploy solution, I am not able to deploy to brand new machines either - machines that have never seen Ops Mgr before. I created a brand new server from scratch and couldn't deploy the agent to it - I got the same errors as above.
June 1st, 2009 9:39pm

HiWhat errors are you getting in the operationsmanager event log on the Root Management Server? Especially with regard to authentication. What happens if you restart the Data Access service or the Management Configuration Service?CheersGraham
Free Windows Admin Tool Kit Click here and download it now
June 1st, 2009 9:47pm

I have restarted the entire server and the services alone multiple times. I have only one OpsMgr server, so the errors above (20000, 21042, 21024) are from the RMS. I get a pair of events (26328/26329) every minute or so, but they didn't look bad. I think I read that those are the events related to the consoles connecting to the MS. When I tried the push install on a new domain server this morning, a bunch of "success" events showed up in the OpsMgr server's event log - the only questionable one was this:Event Type:InformationEvent Source:OpsMgr Config ServiceEvent Category:NoneEvent ID:29102Date:6/1/2009Time:9:00:24 AMUser:N/AComputer:<scomserver>Description:Configuration state of OpsMgr Health Service "{deb5a561-5da5-3257-071a-de3b71f8c63e}" running on "newserver.domain.local" may be out of date. It should contact OpsMgr Config Service to synchronize its configuration state. For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.... and this was followed by:Event Type:NoneEvent Source:Health Service ModulesEvent Category:NoneEvent ID:10616Date:6/1/2009Time:9:00:40 AMUser:N/AComputer:<scomserver>Description:The Operations Manager Server successfully completed the operation Agent Install on remote computer newserver.domain.local. Install account: DOMAIN\systemcenter Error Code: 0 Error Description: The operation completed successfully. For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.... so all looks well, but it just sits there in Agent Managed as "Not monitored"
June 1st, 2009 10:02pm

Hi A couple of troubleshooting docs that might help:http://www.systemcenterforum.org/news/opsmgr-2007-pki-and-gateway-scenarios-part-4-troubleshooting-mutual-authentication/http://blogs.technet.com/operationsmgr/archive/2009/02/17/opsmgr-2007-port-requirements-for-scom-agents-in-a-dmz.aspxGood Luck Graham
Free Windows Admin Tool Kit Click here and download it now
June 1st, 2009 10:24pm

I have actually seen and followed both of those guides in my quest - trust me, I've done my homework before coming here. Either way, both of those documents refer to issues regarding the monitoring of DMZ/non-domain/untrusted domain servers/workstations.I'm at the point now where I'm ready to forget the DMZ server and certificates for now. I can't deploy sucessfully to machines on the same domain as the RMS. There should be no authentication or certificate-related issues with those unless I somehow inadvertantly changed something within SCOM. The machine I have tried to deploy to this morning was temporarily given any/any network access for testing purposes to rule out any network-related issues. Other than port problems, what would cause an agent installed on the same subnet and same domain as the RMS to throw those errors?
June 1st, 2009 10:35pm

Not sure there ismuch more I can add - you do ask "Other than port problems, what would cause an agent installed on the same subnet and same domain as the RMS to throw those errors?" and I would say authentication \ kerberos problems could cause. So (clutching at straws):-Do you have any Service Principal Name alerts on the RMS? - If you look at the health explorer on the RMS, are there any issues that don't show upunder alerts (perhaps because someone else closed the alert without resolving the underlying problem?).- Are there any errors in the system log that show possible kerberos issues on the RMS? But if the agents you deployed earlier are still working then it suggests that there aren't any problems. - What action account are you using for agents?I've never used the utility to create certificates but have done them "long hand" so to speak based on the technet guidance and not had any issues other than typos from me. Hope it isn't the wizard that hascaused the problems.It might be time to open a ticket?Good LuckGraham
Free Windows Admin Tool Kit Click here and download it now
June 1st, 2009 11:19pm

- Where would the SPN alerts show if they existed? I don't see anything related to that under the Operations Manager log.- The health explorer for the "Not monitored" agents doesn't show anything because everything is an empty circle (no checks, x's or !'s) and the health explorer for the RMS is only complaining about an issue related to our Quest Cisco monitor (unrelated). On another, similar note, it seems to not want to monitor new Cisco devices either. Its collecting performance data and displays it under the counters in the Monitoring section, but still says "Not monitored" under the "state" view. - I don't see any... do you know what EventIDs I might look for? I looked at every event since 9am this morning when I added the most recent server and didn't see anything- Thats one of the wierd things I noticed last week and should have mentioned earlier; I'm running R2 RC and when I go to Administration -> Run As Configuration -> Profiles and double-click Default Action Account, it fires up the Run As Profile Wizard. I click Next -> Next and get a list of every server/workstation thatis working along with the account name. All of the machines that are "Not monitored" are missing from that list. The "Add..." button is greyed out at the top, so I can't add any additional. Perhaps its related to this? Its odd because even though the machines aren't showing in that list, this event appears on the one I tried to deploy to this morning:Event Type:NoneEvent Source:HealthServiceEvent Category:Health Service Event ID:7019Date:6/1/2009Time:9:00:02 AMUser:N/AComputer:<agentmachine>Description:The Health Service has validated all RunAs accounts for management group SCOM. For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.Thanks for your help, by the way.
June 1st, 2009 11:53pm

WishI could help a bit more ... we don't seem to be getting far ;-(SPN alerts would show up under the alerts in the console. If someone had accidently closed them without resolving the underlying problem then they would still show as unhealthy in the Health Explorer of the RMS. So under the computer state view, if you right click the RMS and choose Health Explorer, is there anything unhealthy there?Don't know specific events for kerberos errors but I think the source would probably be Kerberos or Authentication or NetLogon. But if other servers are communicating fine then I doubt this is the issue.Run As config - you might not be able to add them as they are still unmonitored .. likewise in the admin tab, under agent managed you can see the action account but for the unmonitored servers I'm guessing you'll just see a blank. To be honest, if this is the RC and you've only got a few agents out there then I'd blow it all away and install the R2 RTM Eval ... I guess it depends on how much work that is compared to the continued (and so far unproductive) attempts to resolve this issue. Have funGraham
Free Windows Admin Tool Kit Click here and download it now
June 2nd, 2009 12:10am

Tim, believe me - I was feeling exactly like you some time ago. I haven't found on the web the exactly perfect steps to create such mutual authentication, although only missing step was that I didn't put Root CA in client and servers Trustet Root Certs container. When I will have some time, I'll try to do step-by-step with screens.Ingrifo - We Do SCOM
June 2nd, 2009 8:17am

Hi RemTrouble is - this doesn't only affect the DMZ servers. It is affecting monitored servers in the same domain as the RMS \ Databasees. So it shouldn't be a certificate issue at all as kerberos should be being used. Unless certificates have also been deployed to the domain servers ... I think OpsMgr uses certificates first and "fails over" to kerberos only if certificates aren't found. So it could be that inaccurate certificates have been deployed to the domain servers ??From:http://technet.microsoft.com/en-us/library/bb735408.aspxCommunication among these Operations Manager components begins with mutual authentication. If certificates are present on both ends of the communications channel, then certificates will be used for mutual authentication; otherwise, the Kerberos version5 protocol is used. If any two components are separated across an untrusted domain, mutual authentication must be performed using certificates. CheersGraham
Free Windows Admin Tool Kit Click here and download it now
June 2nd, 2009 8:41am

The Kerberos/Mutual/possible botched certsthing sounds promising, except the workstations I've tried to deploy to have never used certs before. Is it possible that I somehow put SCOM into some kind of "native"-ish mode where it will only talk to machines with certificates or something?The problem with re-installing is that if this were a production environment and not a proof-of-concept, I'd be up the creek. I can't be re-installing this software all the time, so there has to be a way to fix it on-the-fly, so to speak. If this is a problem with the R2 "RC", then maybe an Microsoft rep can shed some light on this as a "known issue".My job is to evaluate this software before we spend a whole bunch of money on it and right now its not putting on a very good show.
June 2nd, 2009 3:38pm

Hi TimIf you have a MSFT technical account manager then it is probably best to contact them and ask for a bit of assistance. Depending on the number of servers you have they may be able to be accomodating. You've gone through the obvious (and more) and troubleshootingfurther via forums is going to be difficult.As for re-installing - I take your point but if this were production then you wouldn't be using a release candidate. I haven't had issues with certificates before other than errors I have made (usually typos).It ispedantic but it does work. There is plenty within the management packs that cause far more pain than the certificates themselves. This sounds very similar:http://www.eggheadcafe.com/conversation.aspx?messageid=30512767&threadid=30482638Did you reinstall the Root Management Server at any time? Is it running on a virtual machine that is struggling with resource limitations? Is it just coincidence that these happened at the same time as you started trying to deploy certificates? What management packs do you have installed? Have you configured SNMP monitoring?Do any of the machines now work (even the ones you had already deployed)? Can you run reports and if you check performance data, is there any?Questions .. questions ... not sure we'll get any solution though ... but if you are seriously evaluating then I would recommend to do the evaluation on rtm software than pre-release code. At least that way you do know it is the product.Have funGraham
Free Windows Admin Tool Kit Click here and download it now
June 2nd, 2009 8:33pm

Graham, I don't think I have a Microsoft tech account manager, but I might be able to get that set up. * I only installed the R2 RC because I wanted to see the Service Level Dashboard and Visio plug-in capabilities. DO you have any idea when the R2 'select' will come out that I could upgrade my R2 RC to so I can avoid reinstallation? * That situation looks simliar - I do have 29106 errors in my SCOM event log, but all the Health Serive IDs are machines that are currently working (not all of the working machines, though). * I did not reinstall the RMS at any time. It is running on a physical box w/ 2x 2.0GHz Quads and 8GB of RAM. I actually had a few domain machines that were not working well before I started messing with certificates. I have installed many Microsoft and a handful of Quest management packs. * I have several machines that were set up from day 1 that worked then and work now. I just can't add any new machines. Thats what makes this so odd. Thanks, Tim
June 3rd, 2009 12:17am

Hello Tim,R2 RTM Select media will be generally available (GA) on July 1st (to answer part of your question).Thanks,JustinThanks, Justin Incarnato This posting is provided "AS IS" with no warranties, and confers no rights. Use of attachments are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm
Free Windows Admin Tool Kit Click here and download it now
June 3rd, 2009 12:43am

As an extra thought - if you are installing agents manually, did you go to administration, settings and Security (Server) to enable manual agent installs? That would stop communication.Part of the problem could also be putting too many management packs on too quickly, especially 3rd party MPs. Best practice is to do one at a time and make sure everything settles down before doing the next in case an MP does cause a problem. If you do too many at a time you can't determine what has caused the problem.If you can, I'd remove the 3rd party MPs and ideally just get back to windows OS and SQL ... see if everything picks up then.CheersGraham
June 3rd, 2009 5:30pm

Hi Tim,Don't know if this will help or not. Maybe I missed something in the previous discussion..We set up a Cert Server as part of our domain and it caused the AD guys headaches.....So we dumped that idea and built a Cert Server in its own workgroup, just for SCOM certs.Then used the following guide to monitor the DMZ servers http://www.stranger.nl/files/DMZ_server_monitoring_with_SCOM_2007.pdfWorks fine,John Bradshaw
Free Windows Admin Tool Kit Click here and download it now
June 4th, 2009 1:52am

Tim, I have exactly the same issue with a R2 RC edition. As far as I know the configuration I am working on has had no changes to RMS nor DMZ monitoring has been implemented by the use of certificates. It's hard to determine when the problems started because it has been a while since I added a new agent. Did you recently deployed any agents before you where starting on the DMZ? And if so no problems? only after the certificate setup? Because it could be the DMZ monitoring you where working on isn't what is causing this issue. In my setup it isn't anyway. My next step is going to be baselining the setup again like suggested by Graham. This to be sure no MP is causing these issue's. Will let you know my findings.
June 15th, 2009 3:24pm

Hi OskarI would strongly reccommend that you upgrade (or reinstall) to R2 RTM to make sure it isn't an RC issue. Nobody with RTM seems to have this problem. Some ideas (which I suspect you have already tried):- are there any errors in the operationsmanager eventlogs on the RMS \ agents? Or are they the same as Tim initially posted?- are there any kerberos \ authentication errors in the windows system logs?- is the SDK \ Config account a domain admin? If not, does it have rights to read\ write to the Service Principal Name?CheersGraham
Free Windows Admin Tool Kit Click here and download it now
June 16th, 2009 11:23am

Hi Graham,thanks for your respons. It looks like the exact same issue as Tim. I have checked the logs but no messages or events indicating something is wrong with kerberos or authentication.The SDK account is a domain admin so no problems with access rights.I agree it could very well be a RC issue which is causing the problems and will be resolved when RTM is installed. Unfortunatly I will have to wait untill 1 July to be able to upgrade, because an upgrade from RC to RTM trial is not supported.I am going to trouble shoot some more but when I have no more options will try removing management packs just because I am curious if this will resolve the issue or not. If it does there was a wrongly designed MP if not I will reinstall the environment with RTM trial and upgrade after 1 July.CheersOskar
June 16th, 2009 3:07pm

I worked through all of these steps and they yielded pretty much nothing in my situation, I didnt find any orphans with the sql script, no kerberos issues etc. I had identical event log messages on the management server and agents: 21023, 21016, 20070, 21022 Turns out the answer to my issue at least, which had similar if not identical symptoms was answered here: http://anandpv.spaces.live.com/default.aspx?_c11_BlogPart_BlogPart=blogview&_c=BlogPart&partqs=amonth%3d12%26ayear%3d2009&sa=406469737 Out of date management packs following update to SCOM 2007 R2, meant installing any new agents failed. I updated the management packs (as some are dependent on others you may have to run 2/3 times to get all to import correctly). After maybe 5 minutes all my previously problematic agents showed up green and i could add new machines again. I hope this helps someone else out as well. Many thanks to Anand Venkatachalapathy for posting this on his blog. Cheers, Jim.
Free Windows Admin Tool Kit Click here and download it now
May 31st, 2010 10:20am

I'm not sure how out of date management packs would only affect gateway managed machines and not all machines? These gateway monitored agents appear to be functional in all other respects, but they continue to show "Not Monitored" state. Yet performance collections, rules and monitors updates are being into the DHW. The "Not Monitored" state started immediately after upgrading to R2 from SP1. New agent installs for machines that report to SCOM management servers work fine. I tried accessing the link above, but it seems to be broken. Can someone provide any addtional suggestions for resolving this issue? Derv
June 2nd, 2010 10:27pm

use this guide http://www.systemcentercentral.com/tabid/147/IndexId/77779/Default.aspx youl have do download it
Free Windows Admin Tool Kit Click here and download it now
August 19th, 2010 11:07am

http://anandpv.spaces.live.com/blog/cns!AFCCA5892B178862!2798.entry try above. thanks. jim.
August 19th, 2010 12:03pm

Hi All, Today I had the exact same problem. Tried all the above but no luck. Finally, I looked at the trust relation between my two forest and it appeared to be an external trust in stead of a forest trust. After changing the external trust into a forest trust, the kerberos mutual authentication was no problem anymore and the server was listed in SCE! Regards, Denis Baisdbais
Free Windows Admin Tool Kit Click here and download it now
February 18th, 2011 11:20pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics