SCOM Linux Logical Disk Health monitor not working?

Our Linux admin is asking us for a way to have SCOM alert if a mount becomes unmounted. I checked around in SCOM and it appears as if the "Logical Disk Health" monitor should do the trick. The monitor's Product knowledge states specifically under "Cause" that : "An unhealthy state indicates that a file system has gone offline. This may be caused by a disk being unmounted."

It also says it checks the health by "inspecting the mount table to identify permanent, mounted file systems. If a mounted file system identified in a previous iteration is not included in the current enumeration, it is considered unhealthy."

However, to test this, our Linux admin unmounted a mount, and SCOM still shows it as mounted and healthy. It has been unmounted for a day now and still no luck. I've tried recalculating the health, and resetting the health, and it still comes back green. I can even still get performance data off of this mount that no longer exists, because it's probably grabbing the space of the parent mount like you would get from doing a "df -h /'mountname' " command on the server on an unmounted disk. Can someone explain why this monitor isn't working, and provide a possible way to get SCOM to alert when a previously mounted mount is no longer...mounted?

Thanks


April 30th, 2015 7:55pm

Hi Steve,

I'd like to try that, but I still have the issue from one of my previous posts we talked in, wherein I cannot run winrm queries from the SCOM server because it keeps throwing the following error, even though the configuration seems to work as it should, and no once can seem to understand why it happens.

WSManFault
    Message = The WinRM client cannot process the request. The authentication mechanism requested by the client is not supported by the server or unencrypted traffic is disabled in the service configuration. Verify the unencrypted traffic setting in the service configuration or specify one of the authentication mechanisms supported by the server.  To use Kerberos, specify the computer name as the remote destination. Also verify that the client computer and the destination computer are joined to a domain. To use Basic, specify the computer name as the remote destination, specify Basic authentication and provide user name and password.


It's the "...The authentication mechanism requested by the client is not supported by the server.." part and then the "...or specify one of the authentication mechanisms supported by the server..." that makes me think there's some configuration that needs to be done on the Linux end. Should this just "work" right out of the box so to speak? Or is there indeed configuration that needs to happen on the Linux side to make WinRM communication possible? The SCOM monitors that all use WinRM are able to work, so something seems odd with this communication directly.
Free Windows Admin Tool Kit Click here and download it now
May 1st, 2015 5:07pm

When running the winrm command what user are you running as? If it is not the same user SCOM is running as try it as that user.

If nothing else see if your UNIX/Linux admin can run if from the agent:

source /opt/microsoft/scx/bin/tools/setup.sh

omicli wql root/scx "Select Name,IsOnline from scx_filesystem where Name='/opt'"

instance of SCX_FileSystem

{   

[Key] Name=/opt

    IsOnline=true

}

Regards,

-Steve


May 1st, 2015 6:03pm

Not sure what you mean by "how the filesystem got discovered." The mount points were discovered when the server was added to SCOM. I tried uninstalling the agent and then reinstalling, and when the server was discovered again, the missing mount was no longer present in SCOM. However, we tried mounting it again, and assumed SCOM would find the mount eventually and begin monitoring it again. However, it has been over an hour and SCOM still doesn't see the new mount. Does SCOM simply not have the capability to discover new mounts? Or if it does, how long should it take for it to discover it? 

EDIT: As an update, it took 2 hours and 48 minutes for it to finally discover the new mount. Now we're going to unmount again and see how long, if at all , it takes to see it missing.

Free Windows Admin Tool Kit Click here and download it now
May 6th, 2015 3:31pm

Discovery only runs once or twice a day in SCOM so it could take up to 12 hours to discover it. You can always force discovery by restarting the SCOM health service. [net stop healthservice & net start healthservice]

Try the following and see if this shows the file system offline.

- Verify the file system is online on the UNIX/Linux system and shows healthy in SCOM.

- Take the file system offline on the UNIX/Linux system.

- Restart the agent service on the UNIX/Linux system [scxadmin -restart].

- Wait up to 5 minutes and see if the file system now shows offline.

If this works then there is possibly a bug in the agent. What version of the agent are you running [scxadmin -v]? What UNIX/Linux OS is this happening on?

Regards,

-Steve 


May 7th, 2015 2:14pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics