Columnstore Indexes

Dear Pros

I am designing a large DWH and I am thinking of using Columnstore indexes. Kindly I have few questions:

1- Does a Columnstore index consume more space than ordinary indexes?

2- Does a Columnstore index affect the performance of data loading (specially if the number of columns included is large)? If yes, how to optimize the performance of data loading?

3- Which is better to use, Clustered Columnstore index, or non-clustered columnstore index? What are the implications on data loading in both types?

4- Can a columnstore index be partitioned using same partition scheme of the table? If yes can I call it aligned index, although it has additional columns besides the partitioning column?

5- If above is Yes, can I compress table partition (to provide space) without affecting the associated columnstore index partition?

6- Finally, can I compress a column store index partition?

Sorry for asking too many questions at the same time, but as you can see they are related, in which I want to develop an understanding of designing a high-performance data warehouse :)

PS: I would really Appreciate it, if I can get direct answers, instead of links to other pages. Sorry for the laziness, but I am in race with time XD

Regards

April 28th, 2015 8:23am

Dear khidir,

do you plan to use a column store index as it is avaiable since SQL 2012 or do you plan to use clustered columnstore index, which is available since SQL 2014?

My answers would vary depending on Version :)

Best,
Mark


  • Edited by Mkuschel Tuesday, April 28, 2015 9:23 AM typo
Free Windows Admin Tool Kit Click here and download it now
April 28th, 2015 9:22am

In SQL 2014 there are two different kinds of Columnstore indexes.  Nonclustered Columnstore Indexes (NCCI), which are a secondary index that (in SQL 2014) make the table read-only.  And Clustered Columnstore Indexes (CCI) which are an updatable Columnstore that is an alternative the heap or clustered index for the table data.

1. Less, for the same number of columns.

2. Yes.  CCIs and NCCIs spend time in loading to achieve compression and enable fast scanning.

3. CCI.  Nor NCCI you must drop the index before loading and rebuild after (at least for a partition).

4. For NCCI, yes and yes.

5. Yes.

6. It already is compressed. 

In general the CCI provides the best overall performance and compression for large data warehouse tables.

April 28th, 2015 10:39am

Dear MKuschel

I am using SQL Server 2012.

Although David has made points, still I am interested in your response :)

I appreciate your support.

Free Windows Admin Tool Kit Click here and download it now
April 28th, 2015 10:50am

Thanks a lot Dear David, I will mark the answer after I receive the other response :) Just to make the topic hot :)

Thanks again :)

April 28th, 2015 10:51am

Dear David

Just one question. in your response to point (3), do you mean that I cannot use a Non-clustered columnstore index on a live table (that has incremental loads over time)? In other words, do I have to drop the index whenever I want to insert data? or you made this point as a best practice ?

Thanks

Free Windows Admin Tool Kit Click here and download it now
April 28th, 2015 10:55am

In SQL 2012 you have to know, that Columnstore Indexes are read-only.
Because of that I like to answer a bit in another way:

1- Does a Columnstore index consume more space than ordinary indexes?
Yes, the columnstore index itself, but since you have to add the index, you have to calculate additional disk space. The column store index in 2012 is not meant to be used as a replacement for existing indexes

2- Does a Columnstore index affect the performance of data loading (specially if the number of columns included is large)? If yes, how to optimize the performance of data loading?
Since a columnstore index is readonly, only the second part of the question can be answered. Inserting new data into fact tables can be done, be switching in partitions. In Case you know data of last 2 months has been modified, you have to Switch out and reload both partitions (in Case table is partitioned by month).

3- Which is better to use, Clustered Columnstore index, or non-clustered columnstore index? What are the implications on data loading in both types?
Clusted columnstore is only available in SQL 2014. In Case you are considering an upgrade: Clusted columnstore can index be modified! And so update and delete Statements are possible (very important vor dimensions). If you have a lot of movement in your table, handling might be tough, since deleteing data does not really delete it - it is only flagged and still there. Also the index gets separeted into several data stores, which is only integrated by reaching a value (does not have it in mind actuually).

4- Can a columnstore index be partitioned using same partition scheme of the table? If yes can I call it aligned index, although it has additional columns besides the partitioning column?
Same answer, as david you gave :)

5- If above is Yes, can I compress table partition (to provide space) without affecting the associated columnstore index partition?
Also here, yes

6- Finally, can I compress a column store index Partition?
And yes, it is compressed by default

  • Marked as answer by khidir Tuesday, April 28, 2015 1:09 PM
April 28th, 2015 11:04am

Dear Mkuschel

Wow, that was really informative. Thanks very much :D :D

I have further couple of questions: in point (1) what do you mean by: The column store index in 2012 is not meant to be used as a replacement for existing indexes.

and in point (5) I understand that I can compress the table partition after I already created or rebuild the column store index for that partition. Can I do the opposite, meaning compress the partition and then create the column store index, or firstly decompress my partition??

I really appreciate you continuous support :)

Free Windows Admin Tool Kit Click here and download it now
April 28th, 2015 12:07pm

Dear khidir,

you are welcome :)

Since columnstore index is organized by columns, instead of rows, it very good at operations, which use whole table or Partition - but useless when you like to SELECT specific rows.

E.g. If your users query like this:
SELECT * FROM myTablename WHERE ID = 12345
==> 1 row result
Columnstore index will not be used, here your traditional clusted index is far better.

Also columnstore index is bad at nvarchar fields, which have a lof different strings inside - since its algorithm is relying on deduplication. Traditional nonclustered Indexes may be better
e.g.
SELECT * FROM myTablename WHERE Comment LIKE '%foo%'

But if you have something like
SELECT SUM(SalesAmount) FROM FactInternetSales WHERE ProductID = 123
Columnstore index is great and will boost Performance!


According storage:
Yes, this is possible - since the columnstore index, compression of the table does not affect columnstore index.
So you can have a non-compressed table with a columnstore index

April 28th, 2015 12:19pm

Appreciate it very much, Mkuschel :D
Free Windows Admin Tool Kit Click here and download it now
April 28th, 2015 1:09pm

Here are some good pointers on Clustered Column store Indexes -

http://sqlserverdw.blogspot.com/2015/02/sql-server-clustered-column-store-index.html

May 5th, 2015 4:55pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics