We believe we have discovered the issue and think it might be a bug in the management pack. Here is our finding.
We are log shipping a very large database. During the night, there are SQL jobs that process a large amount of data and cause huge transaction log files to build up (> 30GB). When the log shipping target processes these files, it takes longer
than typical to catch up. Applying the logs can take hours.
We tracked the state changes for log shipping and it looks like this...
1. online - multi_user/is_read-only/is_in_standby
2. online - single_user/is_read-only/is_in_standby
3. restoring - single_user
4. restoring - multi_user
5. recovering - multi-user
6. online - multi-user/is_in_standby
7. recovering - multi-user/is_read-only/is_in_standby
8. online - multi_user/is_read-only/is_in_standby
Step 1 is the "idle" state of the log ship destination. It is online, with multi user access, is read only, and in standby. Steps 2 to 7 are part of the restoring logs to the log ship destination. More on step 6 in a moment.
Step 8 is reverting back to the idle state.
Now let's look at the management pack code for the rule in question.
One of the first things it does is define the query for detecting the databases on the instance that are available for running the rule on.
Dim query : query = "SELECT name, database_id FROM sys.databases " & _
" WHERE (state = ? OR state = ?) " & _
" AND source_database_id IS NULL AND is_read_only = 0" & _
" AND name not in ('master', 'model', 'msdb', 'tempdb')"
This query is run with the line
Set listDatabases = dbMasterConnection.ExecuteQueryWithParams(query, Array(DB_STATE_NORMAL, DB_STATE_STANDBY))
Based on this, we can see that it will find databases with the correct state that are NOT read-only. Curiously, the "state = ? OR state = ?" portion uses the constants DB_STATE_NORMAL and DB_STATE_STANDBY.
Const DB_STATE_EMERGENCYMODE = 32768
Const DB_STATE_LOADING = 22
Const DB_STATE_NORMAL = 0
Const DB_STATE_OFFLINE = 512
Const DB_STATE_RECOVERING = 192
Const DB_STATE_STANDBY = 1024
Const DB_STATE_SUSPECT = 256
But according to MSDN documentation (and we confirmed experimentally), those states are partly wrong. Link here:
https://msdn.microsoft.com/en-us/library/ms178534.aspx
According to documentation, the constants should be...
0 = ONLINE
1 = RESTORING
2 = RECOVERING
3 = RECOVERY_PENDING
4 = SUSPECT
5 = EMERGENCY
6 = OFFLINE
7 = COPYING
10 = OFFLINE_SECONDARY
As you can see, "0" for online is correct, but 1024 is not standby. In fact, there is no standby state. Further, we looked at the schema and this value is a tinyint, so anything above 255 is invalid, so 1024 would never be a value.
To detect "standby" there is a separate flag called is_in_standby (see the above link). You cannot use the "state" field to detect standby. But even if that part of the code was correct, the script would have problems accessing
the database (hence why they do not attempt to run the rule against DBs that are read only).
OK, so now we are getting into the meat and potatoes of what's happening here. Most of the time when you are in standby, you are also in read_only, so the is_read_only flag will evaluate to true, the database would be excluded from the check, and there
would be no error during script execution. However, if you look at step 6 that I outlined at the top, the is_read_only flag is NOT set, but the database is also NOT accessible (confirmed experimentally), which I suspect leads to the error I am getting.
Most people probably do not run into this situation because they are log shipping small enough databases with small enough changes that that state does not last long enough to be an issue.
So why does an error actually get generated from all this? This is because after finding the databases available to be looked at in the last query, it will then attempt to connect to the target database to query about its file size. However,
the database is not accessible, so the script generates an error.
Do While Not listFiles.EOF
Dim ContainerName : ContainerName = listFiles("ContainerName").Value
Dim ContainerId : ContainerId = listFiles("ContainerId").Value
Dim ContainerPath : ContainerPath = listFiles("ContainerPath").Value
Dim AllocatedSizeMB : AllocatedSizeMB = 0
Dim UsedSizeMB : UsedSizeMB = 0
Dim NumberOfFiles : NumberOfFiles = 0
query = "SET NOCOUNT ON " & vbCrLf & _
" select " & vbCrLf & _
" convert (float,SUM(chf.file_size_in_bytes))/convert (float,(1024*1024)) as AllocatedSizeMB, " & vbCrLf & _
" convert (float,SUM(chf.file_size_used_in_bytes))/convert (float,(1024*1024)) as UsedSizeMB, " & vbCrLf & _
" Sum(CASE WHEN chf.state = 2 THEN 1 Else 0 End) as NumberOfFiles " & vbCrLf & _
" from sys.dm_db_xtp_checkpoint_files as chf" & vbCrLf & _
" WHERE chf.container_id = ? "
Dim listSpace : Set listSpace = dbConnection.ExecuteQueryWithParams(query, ContainerId)
It is probably failing at that very last query execution because it's trying to connect to a standby database that is in a state that does not allow access (step 6 from above).
The corrected query to ignore databases in this state should look like this
Dim query : query = "SELECT name, database_id FROM sys.databases " & _
" WHERE state = ? " & _
" AND source_database_id IS NULL AND is_read_only = 0 AND is_in_standby = 0" & _
" AND name not in ('master', 'model', 'msdb', 'tempdb')"
and then it should be called like this
Set listDatabases = dbMasterConnection.ExecuteQueryWithParams(query, Array(DB_STATE_NORMAL))
This I think should fix the issue.
This code is repeated throughout the management pack so there are multiple places it has to be corrected to resolve this issue completely.
How do I submit my bug report to the management pack authors?
-
Edited by
Timothy Carroll
Tuesday, March 24, 2015 7:22 PM
-
Marked as answer by
Xin GuoMicrosoft contingent staff, Moderator
Wednesday, March 25, 2015 1:37 AM