Monitoring DELL Blades and CMC's - R2
Hi,Yesterday we had a meltdown of one of our blades. It was running ESX and a number of Virtual Servers were affected by the hardware problem.Problem was OpsMgr didn't send any hardware alerts. Even the ESX logs showed nothing. Nothing in any of the Windows Event logs either.We do use nWorks MP also, but due to a series of budget cuts, this particular Host was not monitored.Monitoring of CMC's is occurring. But under Health Explorer for any CMC, all I can see is a green tick in the AVAILABILITY section. All other subsets of the Entity Health are Open Circles.Monitoring of DELL servers is occuring via the DELL 4 MP. The State view for the servers is fully populated with nice green ticks in all the columns (Sensors, Power Supplies, Processors etc). The only one with an empty circle is BIOS Config Instance and I can't figure out how to turn that on.My questions are:Can the DELL MP pick up all instances of DELL hardware if no Windows O/S is operating?Is this a case for discovering hardware components via SNMP?Thankyou,John Bradshaw
February 12th, 2010 3:22am
This is really a better question for Dell support.The Dell MP that I have worked with uses Dell OMSA installed on the agent based OS to communicate with SCOM.If you want to monitor hardware health on ESX servers - you should consider a 3rd party solution that allows you to do this - the last Dell MP I worked with didnt support this. I believe solutions from Veeam and Bridgeways take this into consideration.
February 14th, 2010 10:39pm
Thx Kevin. JB
February 15th, 2010 2:01pm
A couple of the Dell MPs use the Drac or iDrac interfaces for basic systems management (won't detect anything at the OS level, just hardware). They're detected as network devices through SNMP. We're still having trouble getting them to work with m1000e's and m600 blades, I'm trying to talk to Dell support. So far it's only detected the CMC, not the idracs on each blade. While SCOM is showing the devices present (the CMCs), i'm getting zero information back - everything registers as healthy, even when I remove a blade. There are SNMP OIDs for status reports as well as power (watts, voltage, amps, draw from each PSU, etc), so it's possible to pull all that by SNMP but it's a bear to set up and configure in your management system of choice. I'm trying to use the dell MP to make it work, definitely an uphill battle. There is a version of OMSA you can install on ESX and ESXi, it's the same interface and management system as its windows counterpart. When you need it, it's very handy to have. We're running Server 2008 on all our blades, and all have OMSA installed. It's still not helping. :( Everything I read from Dell says it's possible to monitor the blades and CMCs from SCOM, but I can't find any info about how to make it work. There seems to be some dependence on the BMI management utility (a client for Windows). If I hear anything helpful back from Support I'll add to the thread. If you had any luck I'd appreciate hearing about it. Cheers.
March 24th, 2010 12:12am
Hi Edge, I can monitor CMC's and DELL DRACS. Once discovered, they are OK. Took a bit of backwards and forwards with the network guys, but once they opened the right stuff on the routers, SCOM was OK. Can't get to the iDRACs at the moment though. Simply can't discover them in OpsMgr even though they can be pinged. Funny thing is NAGIOS can. Has no trouble. This may be a bug with OpsMgr I think as nWorks (MP in SCOM) discovers all the ESX host stuff we have on DELL Blades just fine, including the underlying hardware status. Cheers, John Bradshaw
March 25th, 2010 5:25am
John - We're seeing the same problem. DRACs from other Dell servers (2950s and R710s mostly) turn up fine, but the iDrac's are invisible - ops manager doesn't even register them as devices during a scan. The CMCs seem fine, and I think it detected the OMSA addresses but we're not using SCOM to monitor OMSA right now. All in all, it's a bit flaky, and Dell doesn't seem to offer any support outside their own (rather lacking) products, even for their MP. So with your CMCs, are you getting comprehensive status reporting? Can you get usage stats, power monitoring, blade status/health, etc? Or does it just tell you up/down? Cheers -
March 27th, 2010 12:29am
I agree that this problem is something to direct to dell support, but I think we probably are not the only one and maybe someone reading this has found a fix and solution. Half a year ago we bought 2 blade enclosures from dell. We used scom so we added the mp to scom for testing on a scom test server. After some testing it looks like the enclosure at least alerted us. Since then we delved deeper into the mysterious of dell monitoring. Recently we decided to go live with dell monitoring and added the monitoring to our r2 scom produktion server. Since then we get nothing. So we checked if any traps were actually reachting the scom server by installing a snmp listener. In addition we activated the email alerting on the various dell component as a second method to check if dell is sending anything. During testing we discovered the lack of testing mechanism provided by dell. You can send a test event sometimes, but since this is a test event it might be ignored by scom. The results were suprising. The omsa agent can't sent a test alert. The omsa agent fails to react to the removal of a disk from a mirror set or the disconnection of network cable. Sometimes the omsa agent sends a mail via the idrac board. It doesn't trap at all. The only trap it seems to send is a test trap. The enclose does react to removal of components: it sends a mail and it traps. The switches in the entclose don't send out traps and no mail. The equalogic disk cabinets send perfect mail but traps haphazardly. (it fails to trap for instance if the backup controller is removed). The traps are ignored by scom. As far as we stand now the entire trapping mechanism simple doesn't work in scom partly because no traps are send by the dell components and partly because the traps that are send are ignored. The only thing that does work is the email. This of course is probably something that dell should adress. I try and report this here accordingly.
January 7th, 2011 9:12am