Microsoft SQL Server 2014 (Monitoring) - Cannot log into database error for log ship destination DBs

I have a SQL 2014 instance that is a log shipping destination.  At least one log shipped DB (possibly more, but the alert is unclear due to suppression from 'repeat count'), gets "Cannot login to database" many times each day, which raises an alert.  The agent is running as SYSTEM and the SYSTEM account is sysadmin on the instance.

I'm assuming it's having trouble because when log shipped DBs are in restore they cannot be logged into?  This is a guess. 

1. Is this normal behavior for the MP?  I thought the MP was log-ship aware and so would avoid alerts like this.

2. If it's not normal behavior, is this a bug or did I do something wrong with the configuration?  Any guidance on how to fix?

3. If is normal behavior, how do I disable normal DB monitoring on log ship destination DBs?  An optimal solution would be one using, say, a dynamic group that can automatically add future log ship destination DBs when they are created.

Management Pack: Microsoft SQL Server 2014 (Monitoring), Version: 6.5.4.0

MSSQL 2014: Workflow failed to connect to the target system

Management Group: OpsMgr-Main. Script: GetSQL2014DBFileGroupFreeSpace.vbs : Cannot login to database [DB4.domain.com][INSTANCE2:App_Reports]

Repeat count (since 3/14/2015): 304


  • Edited by Timothy Carroll Monday, March 23, 2015 3:45 PM Added MP version
March 23rd, 2015 3:43pm

Thanks for the links.  I checked them and they are not applicable to our situation.  Overriding the monitor/rule is not a fix, and in this case is actually really bad.  I found the rules in question and they target the SQL instance and not the SQL database, so if the monitor is overridden, no databases on the instance will be monitored for space usage.  The failures are intermittent.  Our DBA is investigating and we have a lead.  I decompiled the management pack and am going to check our hypothesis to see if it makes sense.  I'll reply to this thread later with our discoveries.

Free Windows Admin Tool Kit Click here and download it now
March 24th, 2015 3:32pm

We believe we have discovered the issue and think it might be a bug in the management pack.  Here is our finding.

We are log shipping a very large database.  During the night, there are SQL jobs that process a large amount of data and cause huge transaction log files to build up (> 30GB).  When the log shipping target processes these files, it takes longer than typical to catch up.  Applying the logs can take hours.

We tracked the state changes for log shipping and it looks like this...

1.    online - multi_user/is_read-only/is_in_standby
2.    online - single_user/is_read-only/is_in_standby
3.    restoring - single_user
4.    restoring - multi_user
5.    recovering - multi-user
6.    online - multi-user/is_in_standby
7.    recovering - multi-user/is_read-only/is_in_standby
8.    online - multi_user/is_read-only/is_in_standby

Step 1 is the "idle" state of the log ship destination.  It is online, with multi user access, is read only, and in standby.  Steps 2 to 7 are part of the restoring logs to the log ship destination.  More on step 6 in a moment.  Step 8 is reverting back to the idle state.

Now let's look at the management pack code for the rule in question.

One of the first things it does is define the query for detecting the databases on the instance that are available for running the rule on.

Dim query : query = "SELECT name, database_id FROM sys.databases " & _
                        " WHERE (state = ? OR state = ?) " & _
                        "   AND source_database_id IS NULL AND is_read_only = 0" & _
                        "   AND name not in ('master', 'model', 'msdb', 'tempdb')"

This query is run with the line

Set listDatabases = dbMasterConnection.ExecuteQueryWithParams(query, Array(DB_STATE_NORMAL, DB_STATE_STANDBY))

Based on this, we can see that it will find databases with the correct state that are NOT read-only.  Curiously, the "state = ? OR state = ?" portion uses the constants DB_STATE_NORMAL and DB_STATE_STANDBY. 

Const DB_STATE_EMERGENCYMODE = 32768
Const DB_STATE_LOADING = 22
Const DB_STATE_NORMAL = 0
Const DB_STATE_OFFLINE = 512
Const DB_STATE_RECOVERING = 192
Const DB_STATE_STANDBY = 1024
Const DB_STATE_SUSPECT = 256

But according to MSDN documentation (and we confirmed experimentally), those states are partly wrong.  Link here: https://msdn.microsoft.com/en-us/library/ms178534.aspx

According to documentation, the constants should be...

0 = ONLINE
1 = RESTORING
2 = RECOVERING
3 = RECOVERY_PENDING
4 = SUSPECT
5 = EMERGENCY
6 = OFFLINE
7 = COPYING
10 = OFFLINE_SECONDARY

As you can see, "0" for online is correct, but 1024 is not standby.  In fact, there is no standby state.  Further, we looked at the schema and this value is a tinyint, so anything above 255 is invalid, so 1024 would never be a value.

To detect "standby" there is a separate flag called is_in_standby (see the above link).  You cannot use the "state" field to detect standby.  But even if that part of the code was correct, the script would have problems accessing the database (hence why they do not attempt to run the rule against DBs that are read only).

OK, so now we are getting into the meat and potatoes of what's happening here.  Most of the time when you are in standby, you are also in read_only, so the is_read_only flag will evaluate to true, the database would be excluded from the check, and there would be no error during script execution.  However, if you look at step 6 that I outlined at the top, the is_read_only flag is NOT set, but the database is also NOT accessible (confirmed experimentally), which I suspect leads to the error I am getting.  Most people probably do not run into this situation because they are log shipping small enough databases with small enough changes that that state does not last long enough to be an issue.

So why does an error actually get generated from all this?  This is because after finding the databases available to be looked at in the last query, it will then attempt to connect to the target database to query about its file size.  However, the database is not accessible, so the script generates an error.

Do While Not listFiles.EOF Dim ContainerName : ContainerName = listFiles("ContainerName").Value Dim ContainerId : ContainerId = listFiles("ContainerId").Value Dim ContainerPath : ContainerPath = listFiles("ContainerPath").Value Dim AllocatedSizeMB : AllocatedSizeMB = 0 Dim UsedSizeMB : UsedSizeMB = 0 Dim NumberOfFiles : NumberOfFiles = 0 query = "SET NOCOUNT ON " & vbCrLf & _ " select " & vbCrLf & _ " convert (float,SUM(chf.file_size_in_bytes))/convert (float,(1024*1024)) as AllocatedSizeMB, " & vbCrLf & _ " convert (float,SUM(chf.file_size_used_in_bytes))/convert (float,(1024*1024)) as UsedSizeMB, " & vbCrLf & _ " Sum(CASE WHEN chf.state = 2 THEN 1 Else 0 End) as NumberOfFiles " & vbCrLf & _ " from sys.dm_db_xtp_checkpoint_files as chf" & vbCrLf & _ " WHERE chf.container_id = ? "

Dim listSpace : Set listSpace = dbConnection.ExecuteQueryWithParams(query, ContainerId)

It is probably failing at that very last query execution because it's trying to connect to a standby database that is in a state that does not allow access (step 6 from above).

The corrected query to ignore databases in this state should look like this

Dim query : query = "SELECT name, database_id FROM sys.databases " & _
                        " WHERE state = ? " & _
                        "   AND source_database_id IS NULL AND is_read_only = 0 AND is_in_standby = 0" & _
                        "   AND name not in ('master', 'model', 'msdb', 'tempdb')"


and then it should be called like this

Set listDatabases = dbMasterConnection.ExecuteQueryWithParams(query, Array(DB_STATE_NORMAL))

This I think should fix the issue.

This code is repeated throughout the management pack so there are multiple places it has to be corrected to resolve this issue completely.

How do I submit my bug report to the management pack authors?







March 24th, 2015 7:09pm

Xin - that link isn't working for me.  I get "Page Not Found."  But I did find a way to submit bugs to System Center from the welcome page.  Thanks for pointing me in the right direction.
Free Windows Admin Tool Kit Click here and download it now
March 25th, 2015 10:18am

For reference, I tracked down (I think) all the rules and discoveries that are affected by the bug.  I have done some experimentation with disabling them or changing the timing (synctime) where available, and so far I'm able to work around the issue.  I look forward to an actual bug fix, though.


DISCOVERIES

Microsoft SQL Server 2008 (Discovery)
  - Discover Files
  - Discover File Groups

Microsoft SQL Server 2012 (Discovery)
  - Discover Files
  - Discover File Groups

Microsoft SQL Server 2014 (Discovery)
  - MSSQL2014: Discover Data Files
  - MSSQL2014: Discover File Groups


RULES

Microsoft SQL Server 2008 (Monitoring)
- MSSQL 2008: Collect DB File Free Space (%)
- MSSQL 2008: Collect DB File Free Space (MB)
- MSSQL 2008: Collect DB File Allocated Space Left (MB)
- MSSQL 2008: Collect DB File Allocated Space Left (%)
- MSSQL 2008: Collect DB File Group Free Space (%)
- MSSQL 2008: Collect DB File Group Free Space (MB)
- MSSQL 2008: Collect DB File Group Allocated Space Left (MB)
- MSSQL 2008: Collect DB File Group Allocated Space Left (%)
- MSSQL 2008: Collect DB Log File Free Space (%)
- MSSQL 2008: Collect DB Log File Free Space (MB)
- MSSQL 2008: Collect DB Log File Allocated Space Left (MB)
- MSSQL 2008: Collect DB Log File Allocated Space Left (%)

Microsoft SQL Server 2012 (Monitoring)
- MSSQL 2012: Collect DB File Free Space (%)
- MSSQL 2012: Collect DB File Free Space (MB)
- MSSQL 2012: Collect DB File Allocated Space Left (MB)
- MSSQL 2012: Collect DB File Allocated Space Left (%)
- MSSQL 2012: Collect DB File Group Free Space (%)
- MSSQL 2012: Collect DB File Group Free Space (MB)
- MSSQL 2012: Collect DB File Group Allocated Space Left (MB)
- MSSQL 2012: Collect DB File Group Allocated Space Left (%)
- MSSQL 2012: Collect DB Log File Free Space (%)
- MSSQL 2012: Collect DB Log File Free Space (MB)
- MSSQL 2012: Collect DB Log File Allocated Space Left (MB)
- MSSQL 2012: Collect DB Log File Allocated Space Left (%)

Microsoft SQL Server 2014 (Monitoring)
- MSSQL 2014: DB File Available Space Total (%)
- MSSQL 2014: DB File Available Space Total (MB)
- MSSQL 2014: DB File Allocated Space Unused (MB)
- MSSQL 2014: DB File Allocated Space Unused (%)
- MSSQL 2014: DB File Group Available Space Total (%)
- MSSQL 2014: DB File Group Available Space Total (MB)
- MSSQL 2014: DB File Group Allocated Space Unused (MB)
- MSSQL 2014: DB File Group Allocated Space Unused (%)
- MSSQL 2014: DB Log File Available Space Total (%)
- MSSQL 2014: DB Log File Available Space Total (MB)
- MSSQL 2014: DB Log File Allocated Space Unused (MB)
- MSSQL 2014: DB Log File Allocated Space Unused (%)


March 26th, 2015 1:27pm

OK, so I still had issues even after all the overrides above from yesterday, so I figured I'd attack it from a different angle.  Rather than stopping the rule from running, I'd just ignore the alerts.  So I modified Microsoft.SQLServer.20xx.Script_Failed_Database_Login in 2008, 2012, and 2014 flavors so that it won't raise an alert if it sees FreeSpace.vbs or DiscoverSQL20xxFileGroups.js along with "Cannot login to database."  This of course will mask some legitimate failed logons, but should eliminate the noise of this alert, which I was previously getting 100s per day!

I figure since this only affects the SQL file classes, it's an acceptable risk to miss those legitimate failures, since if your DB is inaccessible for another reason, a different part of the MP will catch it (such as your RunAs account not having access, or the DB being offline or something).

I rolled up the modified rules into a couple MPs that you can import.  It will disable the original rule and then use its own which have those filters above.  This is not well tested, use at your own risk, and was designed for 6.5.4.0.  Hopefully future SQL MPs will fix this bug.  The 2012 MP will also work for 2008.  I don't have a 2005 server, so I didn't build a 2005 MP -- I'm not even sure it suffers from the same bug.

Unsealed MPs:

SQL2012FileMonitoringLogonAlertHotfix_For_6.5.4.0.xml: http://d-h.st/Ddab
SQL2014FileMonitoringLogonAlertHotfix_For_6.5.4.0.xml: http://d-h.st/hD8A

Sealed MPs:

SQL2012FileMonitoringLogonAlertHotfix_For_6.5.4.0.mp: http://d-h.st/HZRs
SQL2014FileMonitoringLogonAlertHotfix_For_6.5.4.0.mp: http://d-h.st/6Xdo

** Update 3/31/2015: I missed one of the discoveries and updated the MP to block alerts from it too.


  • Marked as answer by Timothy Carroll Friday, March 27, 2015 7:04 PM
  • Edited by Timothy Carroll Tuesday, March 31, 2015 8:17 PM Updated links with new MP version
Free Windows Admin Tool Kit Click here and download it now
March 27th, 2015 7:02pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics