Is there any reason for difference in order of columns between index of constraint and its statistics definition?

Hi.

Thanks for your input and expertise.

I am really puzzled by an apparent difference between table index key column order and its statistics order.

I was under understanding that index statistics mirror index definition. However, in my db 2470 index ordinal definitions match statistics definition but 66 do not.

I also can reproduce such discrepancy in 2008 R2, 2012 and 2014.

As per definition,

stats_column_id

int

1-based ordinal within set of stats columns

This script duplicates this for me.

BEGIN TRAN
GO
use tempdb
GO
    CREATE TABLE [dbo].[ItemProperties](
	    [itmID] [int] NOT NULL,
	    [cpID] [smallint] NOT NULL,
	    [ipuID] [tinyint] NOT NULL,
     CONSTRAINT [PK_ItemProperties_itmID_ipuID_cpID] PRIMARY KEY CLUSTERED 
    (
	    [itmID] ASC,
	    [ipuID] ASC,
	    [cpID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
GO
    SELECT 
        s.object_id
        ,s.name as stats_name
        ,(
            SELECT
                c.name + ', ' as [data()]
            FROM sys.stats_columns as sc
            JOIN sys.columns as c on c.object_id = sc.object_id and c.column_id = sc.column_id
            WHERE s.stats_id = sc.stats_id and s.object_id = sc.object_id
            ORDER BY sc.stats_column_id ASC
            FOR XML PATH('')
        ) as stats_column_list
    FROM sys.stats as s
    WHERE s.object_id = object_id('[dbo].[ItemProperties]')

    SELECT 
        i.object_id
        ,i.name as index_name
        ,(
            SELECT
                c.name + ', ' as [data()]
            FROM sys.index_columns ic
            JOIN sys.columns c ON c.object_id = ic.object_id AND c.column_id = ic.column_id
            WHERE ic.object_id = i.object_id
            AND ic.index_id = i.index_id
            AND ic.is_included_column = 0
            ORDER BY ic.key_ordinal ASC
            FOR XML PATH('')
        ) as index_column_list
    FROM sys.indexes i
    WHERE i.object_id = object_id('[dbo].[ItemProperties]')
GO
EXEC sp_helpstats '[dbo].[ItemProperties]', 'ALL'
GO
EXEC sp_helpindex '[dbo].[ItemProperties]'
GO
--DBCC SHOW_STATISTICS ("dbo.ItemProperties", PK_ItemProperties_itmID_ipuID_cpID)
GO
ROLLBACK

The result I get is this:

object_id       stats_name                                      stats_column_list
1525580473 PK_ItemProperties_itmID_ipuID_cpID itmID,  cpID,  ipuID,

and

object_id      index_name                                      index_column_list
1525580473 PK_ItemProperties_itmID_ipuID_cpID itmID,  ipuID,  cpID,

Also a query I used to discover this in my db is:

WITH stat AS
(
    SELECT 
        s.object_id
        ,s.name as stats_name
        ,(
            SELECT
                c.name + ', ' as [data()]
            FROM sys.stats_columns as sc
            JOIN sys.columns as c on c.object_id = sc.object_id and c.column_id = sc.column_id
            WHERE s.stats_id = sc.stats_id and s.object_id = sc.object_id
            ORDER BY sc.stats_column_id
            FOR XML PATH('')
        ) as stats_column_list
    FROM sys.stats as s
    WHERE OBJECTPROPERTY(s.object_id, 'IsUserTable') = 1
), ix AS
(
    SELECT 
        i.object_id
        ,i.name as index_name
        ,(
            SELECT
                c.name + ', ' as [data()]
            FROM sys.index_columns ic
            JOIN sys.columns c ON c.object_id = ic.object_id AND c.column_id = ic.column_id
            WHERE ic.object_id = i.object_id
            AND ic.index_id = i.index_id
            AND ic.is_included_column = 0
            ORDER BY ic.key_ordinal
            FOR XML PATH('')
        ) as index_column_list
    FROM sys.indexes i
    WHERE OBJECTPROPERTY(i.object_id, 'IsUserTable') = 1
)
SELECT 
    OBJECT_SCHEMA_NAME(s.object_id) as TblSchema
    ,OBJECT_NAME(s.object_id) as TblName
    ,s.stats_name
    ,s.stats_column_list
    ,i.index_column_list
FROM stat s 
JOIN ix i ON i.object_id = s.object_id AND i.index_name = s.stats_name
WHERE i.index_column_list != s.stats_column_list
ORDER BY s.stats_name

Thanks again for clarification.

Vladimir

September 5th, 2015 1:22pm

As per definition,

stats_column_id

int

1-based ordinal within set of stats columns

I think you need to join to sys.index_columns to get the index column ordinal, and use key_ordinal instead of the stats_column_id in the ORDER BY clause of the first query:

    SELECT
        s.object_id
        ,s.name as stats_name
        ,(
            SELECT
                c.name + ', ' as [data()]
            FROM sys.stats_columns as sc
			JOIN sys.index_columns as ic ON
				ic.object_id = sc.object_id and ic.column_id = sc.column_id
            JOIN sys.columns as c on c.object_id = sc.object_id and c.column_id = sc.column_id
            WHERE s.stats_id = sc.stats_id and s.object_id = sc.object_id
            ORDER BY ic.key_ordinal ASC
            FOR XML PATH('')
        ) as stats_column_list
    FROM sys.stats as s
    WHERE s.object_id = object_id('[dbo].[ItemProperties]');

Free Windows Admin Tool Kit Click here and download it now
September 5th, 2015 1:51pm

Why? ordinal position in stats is defined by stats_column_id. if you let create it, then use GUI, statistics node, properties, general tab, which shows the same order, i.e. itmID, cpID, ipuID.

If you capture that query, it uses this stats_column_id in order by

exec sp_executesql N'SELECT
sic.stats_column_id AS [ID],
COL_NAME(sic.object_id, sic.column_id) AS [Name]
FROM
sys.tables AS tbl
INNER JOIN sys.stats st ON st.object_id=tbl.object_id
INNER JOIN sys.stats_columns sic ON sic.stats_id=st.stats_id AND sic.object_id=st.object_id
WHERE
(st.name=@_msparam_0)and((tbl.name=@_msparam_1 and SCHEMA_NAME(tbl.schema_id)=@_msparam_2))
ORDER BY
[ID] ASC',N'@_msparam_0 nvarchar(4000),@_msparam_1 nvarchar(4000),@_msparam_2 nvarchar(4000)',@_msparam_0=N'PK_ItemProperties_itmID_ipuID_cpID',@_msparam_1=N'ItemProperties',@_msparam_2=N'dbo'
Why should I not use it?

September 5th, 2015 2:04pm

If you insert data into your sample table, update stats, and execute DBCC SHOW_STATISTICS, you will see the actual column order in the density vector result set reflects the index key column order rather than the stats_column_id.  This command shows the actual statistics stored.  However, if you manually create stats on columns in an order that is different than the index keys (e.g. itmID, cpID, and ipuID), the density vector will show the order of the stats_column_id (same order as CREATE STATISTICS).

The query below will return the columns in the actual order of the density vector for both index stats and user statistics, matching DBCC SHOW_STATISTICS.  Don't trust the GUI :-)

    SELECT
        s.object_id
        ,s.name as stats_name
        ,(
            SELECT
                c.name + ', ' as [data()]
            FROM sys.stats_columns as sc
			JOIN sys.index_columns as ic ON
				ic.object_id = sc.object_id and ic.column_id = sc.column_id
            JOIN sys.columns as c on c.object_id = sc.object_id and c.column_id = sc.column_id
            WHERE s.stats_id = sc.stats_id and s.object_id = sc.object_id
            ORDER BY CASE WHEN s.auto_created = 1 OR s.user_created = 1 THEN sc.stats_column_id ELSE ic.key_ordinal END
            FOR XML PATH('')
        ) as stats_column_list
    FROM sys.stats as s
    WHERE s.object_id = object_id('[dbo].[ItemProperties]');

Free Windows Admin Tool Kit Click here and download it now
September 5th, 2015 2:30pm

Dan, would you not say that this then is a bug in a way stats_column_id gets computed?

I see that vectors show proper sequence(as created) but I would expect stats_column_id be right too. It can be changed with GUI, up and down. And it misrepresents actual order...?

Also, your additional JOIN is not quite correct.

    SELECT 
        s.object_id
        ,s.name as stats_name
        ,(
            SELECT
                c.name + ', ' as [data()]
            FROM sys.stats_columns as sc
                LEFT JOIN 
                (
                    sys.indexes i
		            JOIN sys.index_columns as ic ON ic.object_id = i.object_id AND i.index_id = ic.index_id
                ) ON i.object_id = sc.object_id AND i.name = s.name  AND ic.column_id = sc.column_id
            JOIN sys.columns as c on c.object_id = sc.object_id and c.column_id = sc.column_id
            WHERE s.stats_id = sc.stats_id and s.object_id = sc.object_id
            ORDER BY 
                --sc.stats_column_id
                CASE WHEN s.auto_created = 1 OR s.user_created = 1 THEN sc.stats_column_id ELSE ic.key_ordinal END
            FOR XML PATH('')
        ) as stats_column_list
    FROM sys.stats as s
    WHERE s.object_id = object_id('[dbo].[ItemProperties]');

Thinking about it I wonder how SQL knows of the fact that statistics derived from/related to an index besides name match.

Thanks for your explanation

September 5th, 2015 5:59pm

found it!

Dan, query adjustment you provided is workaround to a bug reported at https://connect.microsoft.com/SQLServer/Feedback/Details/1163126

stats_column_id is simply not populated properly.

Thanks

Free Windows Admin Tool Kit Click here and download it now
September 6th, 2015 9:23am

found it!

Dan, query adjustment you provided is workaround to a bug reported at https://connect.microsoft.com/SQLServer/Feedback/Details/1163126

Thanks for posting the link.  I upvoted.  Note that this could be a documentation issue rather than a bug in the engine.  That being said, it would be more intuitive if it behaved as you want.  

September 6th, 2015 10:04am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics