Linux Hosts not heartbeating

Hi Steve,

I hope you are well.

I am in the process of deploying SCOM Agents on a number of Linux Servers in our Private Cloud Environment.

A few servers are not heart beating to SCOM. I installed the scx package on the Linux hosts, got Ports 22, 1270 and 5723 opened through the firewall, generated scx certificates, got them signed by the SCOM Server and copied the signed scx certificates to /etc/opt/Microsoft/scx/ssl. I enabled verbose logging on the Linux Hosts and the scx agent on the SCOM Server is able to communicate to the SCX CIM Provider on the Linux hosts. When I run a vmstat from the SCOM Server,  on one occasion, I get the swap,free, buffer and cache memory details which triggers the SCXUserCoreProviderModule. However, the vmstat does not run every 3 seconds and throws the following error

WSManFault
WinRM cannot complete the operation. Verify that the specified computer name is valid, that the computer is accessible over the network, and that a firewall exception for the WinRM service is enabled and allows access from this computer. By default, the WinRM firewall exception for public profiles limits access to remote computers within the same local subnet. 

Can you please provide some pointers on what might be incorrect in the configuration, which is preventing the hosts from getting monitored by SCOM 2012 ?

Regards

Harry

July 21st, 2015 5:56pm

Harry,

My first thought is what does the DNS server return for the FQDN of the Linux systems in your private cloud that the SCOM server is using? SCOM will do a DNS lookup, forward and reverse, of the Linux system and if it does not match the name in the certificate it will fail to communicate with the agent.

How did you generate the certificates on the Linux agents?

FYI - unless you are monitoring Windows systems in your private cloud with SCOM, port 5723 does not need to be open for Linux agents.

Regards,

-Steve

Free Windows Admin Tool Kit Click here and download it now
July 21st, 2015 6:29pm

Hi Steve,

Thank you for your reply.

The DNS Server returns the IP Address of the FQDN and both forward and reverse lookup are working fine.

The certificates were initially generated by running the command 'rpm -ivh (rpm package) and then added with hostname and domain name by running the command (scxsslconfig -f -h <hostname> -d <domainname>)

The scx.pem and scx-host-<hostname>.pem were copied over to the SCOM Server and signed by SCOM using scxcertconfig.exe, copied back to the Linux Hosts and SCX CIM Server was restarted.

The agent was then discovered and installed by the SCOM Server discovery method.

The SCXCoreProviderModule is getting started but not the SCXUserCoreProviderModule.

The winrm Enumeration does return all Server details within a few seconds (3 to 4 seconds).

Hope this helps.

Regards

Harry

July 21st, 2015 11:23pm

Did you set up the 'Run As Configuration --> Profiles' correctly? There are 3 profiles that need to be setup for UNIX/Linux monitoring.

Did you verify the username/password being used to connect to the agent is correct? Run the following winrm command and use the username/password you have setup in the 'UNIX/Linux Action Account' profile?

winrm enumerate http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_Agent?__cimnamespace=root/scx -username:<UNIX/Linux user> -password:<UNIX/Linux password> -r:https://<UNIX/Linux system>:1270/wsman -auth:basic -encoding:utf-8

Are you using SUDO elevation at all?

Do you have multiple management servers in a Resource Pool that manage the UNIX/Linux agents? If so, are their certificates shared across all MS in the Resource Pool?

Do you have your distribution security option set to 'More Secure' for your Run As Accounts? If so, are all Management Servers added to the list?

-Steve


Free Windows Admin Tool Kit Click here and download it now
July 22nd, 2015 2:08pm

Did you set up the 'Run As Configuration --> Profiles' correctly? There are 3 profiles that need to be setup for UNIX/Linux monitoring.

Did you verify the username/password being used to connect to the agent is correct? Run the following winrm command and use the username/password you have setup in the 'UNIX/Linux Action Account' profile?

winrm enumerate http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_Agent?__cimnamespace=root/scx -username:<UNIX/Linux user> -password:<UNIX/Linux password> -r:https://<UNIX/Linux system>:1270/wsman -auth:basic -encoding:utf-8

Are you using SUDO elevation at all?

Do you have multiple management servers in a Resource Pool that manage the UNIX/Linux agents? If so, are their certificates shared across all MS in the Resource Pool?

Do you have your distribution security option set to 'More Secure' for your Run As Accounts? If so, are all Management Servers added to the list?

-Steve


July 22nd, 2015 6:07pm

Did you set up the 'Run As Configuration --> Profiles' correctly? There are 3 profiles that need to be setup for UNIX/Linux monitoring.

Did you verify the username/password being used to connect to the agent is correct? Run the following winrm command and use the username/password you have setup in the 'UNIX/Linux Action Account' profile?

winrm enumerate http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_Agent?__cimnamespace=root/scx -username:<UNIX/Linux user> -password:<UNIX/Linux password> -r:https://<UNIX/Linux system>:1270/wsman -auth:basic -encoding:utf-8

Are you using SUDO elevation at all?

Do you have multiple management servers in a Resource Pool that manage the UNIX/Linux agents? If so, are their certificates shared across all MS in the Resource Pool?

Do you have your distribution security option set to 'More Secure' for your Run As Accounts? If so, are all Management Servers added to the list?

-Steve


Free Windows Admin Tool Kit Click here and download it now
July 22nd, 2015 6:07pm

Hi Steve,

Yes there are 3 Run As Profiles created

  1. 1) Linux Agent Maintenance Account
  2. 2) Linux Monitoring Account
  3. 3) Linux Priv Account to monitor Privileged Access

User name / password to connect to the Agent appears to be correct and the user account has been added in the Linux Servers and has been given correct permissions in the sudoers file (Defaults:User Account !requiretty) and (User Account ALL = (ALL) NOPASSWD: ALL)

The winrm enumeration does return the following

SCX_Agent
    Architecture = x64
    BuildDate = 2014-03-21T00:00:00Z
    BuildNumber = 308
    Caption = SCX Agent meta-information
    Description = Labeled_Build - 20140321
    ElementName = null
    HealthState = null
    Hostname = Linux Hostname
    InstallDate = 2015-07-22T08:32:27Z
    KitVersionString = 1.4.1-308
    LogicalProcessors = 2
    MachineType = Virtual
    MajorVersion = 1
    MinActiveLogSeverityThreshold = TRACE
    MinorVersion = 4
    Name = scx
    OSAlias = RHEL
    OSName = Red Hat Enterprise Linux Server
    OSType = Linux
    OSVersion = 6.6
    OperationalStatus = null
    PhysicalProcessors = 1
    RevisionNumber = 1
    Status = null
    StatusDescriptions = null
    UnameArchitecture = x86_64
    VersionString = 1.4.1-308

We have 2 MS and certificates are shared across both the MS in the resource pool

Distribution security has been set to 'More Secure' and both MS are added to the list. I am guessing at this stage, that something is timing out WINRM connectivity randomly. I have disabled IPv4 and IPv6 firewalls on the Linux Hosts.

Running vmstat from the SCOM Server sporadically returns swap and free memory information, but at times gives WSMAN FAULT stating that WINRM cannot complete the Operation.

Regards

Harry

July 22nd, 2015 11:00pm

Seems like you have everything setup correctly in SCOM. You haven't adjusted any of the Secure Channel protocols [TLS or SSL] on the SCOM server by an chance?

I'd start looking at the network and see if there are any issues there. Can you put an analyzer on it and see what is going on? You can always open a support ticket with MS and they can enable trace logging in SCOM and verify everything is working as expected.

Regards,

-Steve


Free Windows Admin Tool Kit Click here and download it now
July 23rd, 2015 1:43pm

Seems like you have everything setup correctly in SCOM. You haven't adjusted any of the Secure Channel protocols [TLS or SSL] on the SCOM server by an chance?

I'd start looking at the network and see if there are any issues there. Can you put an analyzer on it and see what is going on? You can always open a support ticket with MS and they can enable trace logging in SCOM and verify everything is working as expected.

Regards,

-Steve


July 23rd, 2015 5:42pm

Seems like you have everything setup correctly in SCOM. You haven't adjusted any of the Secure Channel protocols [TLS or SSL] on the SCOM server by an chance?

I'd start looking at the network and see if there are any issues there. Can you put an analyzer on it and see what is going on? You can always open a support ticket with MS and they can enable trace logging in SCOM and verify everything is working as expected.

Regards,

-Steve


Free Windows Admin Tool Kit Click here and download it now
July 23rd, 2015 5:42pm

Hi Steve,

Thank you for your update. Yes, I will look at the network side to see if anything is timing out the WINRM settings.

Alternatively, I am looking at changing the password on the SCOM Monitoring Account. Would it require to uninstall and re install the SCOM 2012 Agents on the Linux boxes?

Regards

Harry

July 29th, 2015 9:12pm

No, a password change would only require you to update the Run As accounts in SCOM for the UNIX/Linux agents. Reinstalling the agents is not necessary. Of course you will need to update the password on each Linux box for the Linux user you set in the Run As accounts.

Regards,

-Steve

Free Windows Admin Tool Kit Click here and download it now
July 30th, 2015 8:11am

Hi Steve,

Thank you for your reply and sorry for the delay in response.

I finally managed to get this to work. The problem was that the second MS in the resource pool was trying to manage those Linux hosts and the MS Certificates were cross shared, but not the SCX Certs.

I cross shared the SCX Certs by exporting them from each MS and importing them into one another, got port 1270/22 to be opened between the other MS and the failed Linux Hosts and re deployed the SCX Agent and that worked.

Thank you for your assistance.

Regards

Harry

August 16th, 2015 9:05pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics