Health Service Crashing - Event ID 4000 and
I am getting this error on our SCE installation:Event Type: ErrorEvent Source: HealthServiceEvent Category: Health Service Event ID: 4000Description:A monitoring host is unresponsive or has crashed. The status code for the host failure was 2164195371.Followed with this Warning:Event Type: WarningEvent Source: HealthServiceEvent Category: Health Service Event ID: 1103Description:Summary: 1787 rule(s)/monitor(s) failed and got unloaded, 0 of them reached the failure limit that prevents automatic reload. Management group "RSM01_MG". This is summary only event, please see other events with descriptions of unloaded rule(s)/monitor(s).and this one:Event Type: WarningEvent Source: HealthServiceEvent Category: Health Service Event ID: 1103Description:Summary: 1 rule(s)/monitor(s) failed and got unloaded, 0 of them reached the failure limit that prevents automatic reload. Management group "RSM01_MG". This is summary only event, please see other events with descriptions of unloaded rule(s)/monitor(s).--------------------------------------I assumed that this was related to the SNMP network devices I added, so I applied this hotfix KB951526 (per Clive Eastwood) with no joy. Has anyone experienced this issue? It seems like I have to restart the health service for 'things' to return to normal.
August 4th, 2008 11:29pm
Hi Neale,Did you import any third-party Management Packs before?Please Open SCE console, navigate to Administration space, choose "Management Pack", Reviewing dependency on the SNMP library MP to check whether there is any third-party Management Packs depending on it.If there is, try to delete these third-party MP, and check whether the problem still exists.--------------------Regards,Eric Zhang
August 7th, 2008 5:21am
There was only one 3rd party mp that had a dependency for the SNMP Library MP. I have removed that and the issue still exists.
August 8th, 2008 12:20am
Hi Neale,We need to turn on Watson reporting for your SCE server. Please run "regedit" on your SCE console, expand to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HealthService\ParametersChange the value of "Error Reports Enabled" from 0 to 1 to enable the Watson.Watson reporting is enabled there should be some other events logged giving the bucket id of the crash report.Please post them in this thread.--------------------Regards,Eric Zhang
August 11th, 2008 6:34am
There is a hotfix available for Event ID 4000 issued by Microsoft not too long ago. Page is here. http://support.microsoft.com/kb/951526 Not sure if this will help. I havent applied it yet.
August 11th, 2008 11:47am
Eric, thank you for your help. We have opened a case with MS Support on the issue. When resolved, I will post the real problem/resolution on this thread... If I remember Neale
August 12th, 2008 9:08am
Hi Neale, Have you solved this problem? I have the same. I monitor six HP switches and its OK. When I add one IBM SAN 16B Switch, I have imediatelly this error and Healt Service Crash. I tried to apply hotfix KB951526, but its the same. I have to remove this SAN switch. I use Quest sollution for VMWare and LINUX monitoring and this MP depend on SNMP device on SNMP library MP. So I need some sollution. Thanks Jan
August 26th, 2008 4:23am
Well anyone? Quote: I will reply when I have the true solution? That was almost 4 months ago? Yeh I could call MS but why? To be further dissolutioned with them and put hands up in and say @$#%^^*&* it? I did a bunch of SNMP additions (8 switches) and this *** hit the fan. So they screwed us with this so called hotfix and now it hotstops? Surprised if you've been like this for last 4 months? Bueller, Bueller???? Only if you know Ferris. Event ID 5300, The local health service is not healthy. Entity state change flow is stalled with pending acknowledgment blah blah blah.......
November 28th, 2008 9:50am
We are aware of an issue where Essentials is unable to correctly monitor network devices that report an interface speed of 2Gb/s orgreater. One of the symptoms is that the Health Service will unexpectedly exit, logging Event ID 4000 in the Event Log. The cause of this issue is separate from the issue resolved with KB951526, which is why applying that KB doesnt resolve this issue. There are 2 possible workarounds, which will not help in every situation: 1) If your network device does not have interface speeds > 2Gb/s, check for updated firmware from your device manufacturer 2) Exclude network devices reporting interface speeds > 2Gb/s from monitoring. We are working on a fix, although I dont have a release date yet. The fix was initiated afterinvestigation ofthe case Neale opened with Microsoft Support. We havent provided a fix for Neale yet, which is why he hasnt been able to post an update. I will post an update to this thread once the fix is available for download.
December 1st, 2008 6:01pm
Richard, All I ask is for a little communication on these forums. You don't just leave a thread open ended for 4 months, particularly with the teething issues the System Center products have been through, otherwise it just appears to be one annoyance after another. Neale did say 'If I remember' so it seemed he didn't. Well I went ahead and opened a case anyway with Microsoft as I could not assume there has been no fix after 4 months and I need this product to work so I can move on. Who is putting up with these problems for this long? This is a monitoring system afterall! A company could lose a lot of clients, or have to write a lot of refund cheques,particularly the SMB market SCE is aimed at when SLA's cannot be met because it's monitoring system is not up to the task. Now, the issue at hand. Hang on, greaterthan 2Gb? Your first line states has problems greater than. You then suggest firmware for devices than do not have above 2Gb. I don't get it? All our devices are less than 2Gb and all have the latest stable firmware. How is this a seperate issue from KB951526? The errors are exact to the letter, though our issue does not involve any MP's, only device discovery and default enabled SCE reporting from there. thanks
December 1st, 2008 6:54pm
Hi Hittin, We have seen that in some cases network devices will incorrectly report an interface speed >2Gb/s. Sometimes there is a firmware update that results in the devices reporting the correct interface speed. It isn't always an option, but has helped others. It won'thelp if you have devices with >2Gb/s interfacesbecause Essentials isn't correctly handling thesefaster devices. The Event ID 4000messages don't uniquely identify a single problem - just that the Health Service hascrashed or stopped responding.I agree the symptoms are similar, but looking at the stack traces the issues are different and require different fixes. Ask the support engineer responsiblefor yourMicrosoft Support case to contact me and if their research hasn't alreadyfound it, I can point them to an internal article that will help them confirm whether the issue you're seeing will be fixed by the hotfix we're working on. Thanks.
December 2nd, 2008 11:31am
Requestan escalation to second level if you haven't already done this... good luck with MS.
December 2nd, 2008 4:55pm
Neale, Doing a trace and performing a SNMP discovery the tech has found in the TracingGUIDsNative.log. Cause unknown at this stage. 0 00000000 1448.7528::12/04/2008-15:50:17.752 [HealthServiceCommon] Error EventLogUtil::LogEvent(EventLogUtil_cpp272)Logging error event with args 2164195371 Richard, Gave tech link to this article, he said he would be in contact with you. thanks
December 8th, 2008 8:38pm
Hi all, I am having the same issue in my SCE SP1 environment. I have 25 sce managed servers and couple clients. As soon as I start installing SCE agent to client machine, I get Event id 4000s and Healthservice starts using very high cpu.I have installed the MS hotfix but didn't solve the problem.http://support.microsoft.com/?kbid=951526If any of you have update on this thread, I would really appreciate it.Thanks,Shn
February 2nd, 2009 10:59am
Hi, I'm also waiting for an update to this thread. Richard refered me here after posting a thread with similar problems. At the moment I can only discover one of our SNMP devices without causing the Health Service to crash. If I can provide any information that might help, I'm happy to do so.Regards,Andy.
February 2nd, 2009 7:02pm
Thanks for responding Andy,I had few network devices that were creating SNMP related error messages in event viewer every time right before HealthService crashes. I stopped monitoring those devices and reboot the server but it didn't help to fix problem.Did you get SNMP error messages in event viewer before Health service crashes when you start experiencing the problem? Thanks,Shn
February 3rd, 2009 9:40am
Here is an update from my case. I stopped monitoring all the network devices in my SCE environment and reboot the server. After reboot, CPU usage is back to its normal state andHealthService is pretty healty :). Now I need to figure out which network device(s) cause thisissue and keep not monitoring those devices until nextupdate/service pack/hotfixbecomes available to fix the root of the problem.Shn
February 3rd, 2009 11:15am
A short update - we have created a hotfix and are currently going through the release process to make it available to you as a download on Microsoft.com with an accompanyingknowledge base article.I expect we'll have theupdate and KB article available this month (the exact timing is difficult to predict).Thanks for you patience - I realize this fix is taking a long time to appear.This posting is provided "AS IS" with no warranties, and confers no rights.
February 3rd, 2009 2:07pm
Hi,Currently have a case open with Microsoft. After providing more crash dumps on Event ID 4000 on MonitoringHost.exe than I care to remember, they have provided me with an updatedSystem Center Essentials 2007 Network Device Monitoring Library MP.msi to test.I've had it installed for the last 3 days and it has not generated Event ID 4000 and therefore does not unload all of my rules.It does however flood my Ops Manager Event Log with Event's:ID 11052Module was unable to convert parameter to a double value Original parameter: '$data/SnmpVarBinds/SnmpVarBind/Value$' Parameter after $Data replacement: '' Error: 0x80020005 Details: Type mismatch.Instance name: 53.VLANxx ID21405The process started at 9:10:36 AM failed to create System.PropertyBagData, no errors detected in the output. The process exited with 1 Command executed: "C:\WINDOWS\system32\cscript.exe" /nologo "UtilizationCalc.vbs" 0 0. 300 true Working Directory: C:\Program Files\System Center Essentials 2007\Health Service State\Monitoring Host Temporary Files 21\615\ One or more workflows were affected by this. Workflow name: Microsoft.SystemCenter.NetworkDevice.Interface.OutboundUtilizationPercentPerfInstance name: 11.FastEthernet11These have been reported back to Microsoft and am awaiting a reply.
February 4th, 2009 8:16pm
And also heaps of Event ID 101 constantly. UtilizationCalc.vbs : Script received a speed less than or equal to zero, and can not calculate utilization
February 4th, 2009 8:24pm
Thanks for the updates. Shn, I don't get any SNMP errors in the event log before the health service crashes. I also find that the4000 event ID appears almost immediately after discovering the device, if this helps you to work out which devices are causing the problem.Regards,Andy.
February 5th, 2009 5:19pm
Any update on how this hotfix is progressing?Thanks,Andy.
March 10th, 2009 6:02pm
Hi Is there any hot fix available by this time , as I am also observing on many agents same problem. It could be great if you can help on this issue.ThanksObulobula
March 16th, 2009 7:55pm
The updated Network Device Monitoring Library Management Pack is available to download from:http://www.microsoft.com/downloads/details.aspx?FamilyID=8200e405-f871-4f19-a991-0411285fcbe5&displaylang=enThe related KB article (KB960569) and a listing in the System Center Essentials Management Pack catalog will appear in the next week or so.The new Management Pack is not upgrade compatible, which means that you will need to delete the existing Network Device Monitoring Library Management Pack before importing the new version. Information on how to delete the Management Pack is listed at the bottom of the download page.I did delay the release of this Management Pack since after installing it Hittin was still seeing errors and I wanted to know if it was due to the changes made in the Management Pack, or something else. It looks like the errorsfall into the "something else" category and it is still being investigated.The new Management Pack does stop the HealthService crashing with an Error 4000 event if NetworkDevices are returning interface speeds >2GB/s.ThanksRichardThis posting is provided "AS IS" with no warranties, and confers no rights.
March 24th, 2009 11:18am
Hi,So now my Custom MP's seem to be an issue, crashing the MonitoringHost.exe processes. Related to OLEDB and Web Applications custom monitoring via Authoring/System Center Templates/.One DUMP appears to be looking for a Certificate, even though there is no certificate involved between the RMS Essentials server (watcher node) and the SQL Servers OLE connection strings because they are all in the same AD domain. They fail at this point.However yes, at the same time, the RMS is monitoring Workgroup Servers for which it is a CA, and distributed certificates to these servers for Mutual Authentication.Also, Web Application Templates freeze as well. No certificates apart from the local domain CA generated and that info comes through fine to the console from the installed agent. There's no SSL involved to complicate matters. Cause unknown, answer appears to be an X-File. Seriously, Mulder and Scully couldn't solve this one, not that they've solved much prior to this.
April 3rd, 2009 10:05am
Problem was User Dump Process and SCE tracing enabled together caused MonitoringHost.exe to crash dump.Nothing wrong with Custom MP's or OleDB or Web Applications Monitoring.
July 21st, 2009 11:56pm
I had an error event DHCP event 1003. I ran the above fix which I extracted but received no response. Should I go to the extraction and redo it? CharleneCharlene
July 5th, 2012 2:36am