Cluster and Non cluster index

i am not very good in database. when we create cluster index i hard that when we insert data then every time data is sorted based on cluster index and stored. suppose there is 100,000 data in table and when a new data inserted then again 100,000 data is sorted and saved in db. if this is true for cluster index then it would be very expensive .

and same way this sorting not done for non cluster index. is it true?

if my concept is wrong then please some one tell me the whole story in details for Cluster and Non cluster index with easy example. thanks

July 29th, 2015 2:59pm

That is true, if you have a clustered index on a table every time you insert data into the table you would take a slight performance hit since the table is sorted on a clustered index and inserted data needs to be inserted in its proper place withing a clustered index.

The same applies to non-clustered indexes as well since they are sorted as well, the hit is generally smaller since it would only pertain to columns within the non-clustered index.

In general indexes surve the purpose to streamline data retrival/manipulation not data inserts. If you are not planning on returning data from a table to an application then there is no need to create indexes on the table. but generally on OLTP you read more than you write and the hit on writing is more than compensated by the performance gain and application responce time on reads.

Let me know if this helps

Free Windows Admin Tool Kit Click here and download it now
July 29th, 2015 3:34pm

Clustered index is not effective while scanning a huge amount of rows.

The thing is that if SQL Server could theoretically use a particular non-clustered index but the Optimiser estimates there will be 'too many' bookmark-lookups then said index will be ignored. However, if all selected columns are accessible directly from the index, there'll be no need for a bookmark-lookup.

In your case the fact that you're accessing the data via "clustered index seek" is very promising. It will be very hard to improve on its performance. A non-clustered index including all selected columns may be slightly faster, but only because the raw data is a little less. (But don't forget the cost of increased insert/update time.)

However, you should check the detail...

If you're using a composite key and the seek is actually only on the beginning of the key, you might not be so lucky. You may find the seek is only narrowing down to 500,000 rows and is then searching that based on other criteria. In this case experiment with some non-clustered indexes.
The clustered index seek itself may be fine; but if it is being done 100,000 times in your query because some other aspect is inefficiently returning too many rows - then you won't gain much by improving the performance of the clustered index seek.
Finally, to elaborate on davek's comment: "cost is relative". Just because the clustered is 77% of your query cost doesn't mean there's a problem. It is possible to write a trivial 1 table query that returns a sinlge row and clustered index seek cost at 100%. (But of course, being the only 'work' done, it will be 100% of the work... and 100% of instant is still instant.

Have a look at the following articles:

Effective Clustered Indexes

14 SQL Server Indexing Questions You Were Too Shy To Ask

July 29th, 2015 3:55pm

Clustered index is not effective while scanning a huge amount of rows.

The thing is that if SQL Server could theoretically use a particular non-clustered index but the Optimiser estimates there will be 'too many' bookmark-lookups then said index will be ignored. However, if all selected columns are accessible directly from the index, there'll be no need for a bookmark-lookup.

In your case the fact that you're accessing the data via "clustered index seek" is very promising. It will be very hard to improve on its performance. A non-clustered index including all selected columns may be slightly faster, but only because the raw data is a little less. (But don't forget the cost of increased insert/update time.)

However, you should check the detail...

If you're using a composite key and the seek is actually only on the beginning of the key, you might not be so lucky. You may find the seek is only narrowing down to 500,000 rows and is then searching that based on other criteria. In this case experiment with some non-clustered indexes.
The clustered index seek itself may be fine; but if it is being done 100,000 times in your query because some other aspect is inefficiently returning too many rows - then you won't gain much by improving the performance of the clustered index seek.
Finally, to elaborate on davek's comment: "cost is relative". Just because the clustered is 77% of your query cost doesn't mean there's a problem. It is possible to write a trivial 1 table query that returns a sinlge row and clustered index seek cost at 100%. (But of course, being the only 'work' done, it will be 100% of the work... and 100% of instant is still instant.

Have a look at the following articles:

Effective Clustered Indexes

14 SQL Server Indexing Questions You Were Too Shy To Ask

  • Marked as answer by Mou_kolkata Thursday, July 30, 2015 11:27 AM
Free Windows Admin Tool Kit Click here and download it now
July 29th, 2015 7:53pm

>>>>100,000 data in table and when a new data inserted then again 100,000 data is sorted and saved in db

If you have a CI on the date column for example and insert the data into which may 'break' the sequence

20150101

20150110

.........

Now you insert  say 20150104, so SQL Server has to split a page and move some of the data into a new allocated pages, that is costly process and force a fragmentation that affects performance

The main difference between NCI and CI is that leaf level of the CI is actual data but in NCI leaf leave is a pointer to the CI key

July 30th, 2015 1:37am

Yes you are correct. 

clustered Index: it stores the data by sorting. That means the data stored in the clustered index is in order

where as Non-clustered index data stored is not in order.

Free Windows Admin Tool Kit Click here and download it now
July 30th, 2015 1:46am

Yes you are correct. 

clustered Index: it stores the data by sorting. That means the data stored in the clustered index is in order

where as Non-clustered index data stored is not in order.

  • Marked as answer by Mou_kolkata Thursday, July 30, 2015 11:26 AM
July 30th, 2015 5:44am

thanks for reply but this you said "The same applies to non-clustered indexes as well since they are sorted as well, the hit is generally smaller since it would only pertain to columns within the non-clustered index."  which is not clear to me.

please discuss it with more details if possible. thanks

Free Windows Admin Tool Kit Click here and download it now
July 30th, 2015 8:42am

OK so first of all, Both Clustered index and non-clustered indexes are btrees so they are both sorted. the only difference is that CI is a physical sort order of the actual table where non-clustered indexes are logical and are stored seperately from the table itself

Here is the scenario for which i was talking about where NC is smaller hit:

you have a table with 20 columns, CI on ID(bigint) column and a NCI on the same ID column ( i know its a dup)

Sorting of either index is the same both sorted on ID, but whenever you go and insert a row into the table.  CI index would write an entire 20 columns ( since it is the table) to disk, while NCI would only have to write information about the ID column.

Does that help?

July 30th, 2015 3:11pm

You may see the difference between them

  • A clustered index determines the order in which the rows of the table will be stored on disk and it actually stores row level data in the leaf nodes of the index itself. A non-clustered index has no effect on which the order of the rows will be stored.
  • Using a clustered index is an advantage when groups of data that can be clustered are frequently accessed by some queries. This speeds up retrieval because the data lives close to each other on disk. Also, if data is accessed in the same order as the clustered index, the retrieval will be much faster because the physical data stored on disk is sorted in the same order as the index.
  • A clustered index can be a disadvantage because any time a change is made to a value of an indexed column, the subsequent possibility of re-sorting rows to maintain order is a definite performance hit.
  • A table can have multiple non-clustered indexes. But, a table can have only one clustered index.
  • Non clustered indexes store both a value and a pointer to the actual row that holds that value. Clustered indexes dont need to store a pointer to the actual row because of the fact that the rows in the table are stored on disk in the same exact order as the clustered index and the clustered index actually stores the row-level data in its leaf nodes. Ref

Free Windows Admin Tool Kit Click here and download it now
August 2nd, 2015 11:39pm

You should also become familiar with the concept of FILL FACTOR.

By default the FILL FACTOR on the index will be 0 (equivalent to 100) which means all the index pages are completely filled with no free space remaining.

As Uri has mentioned, if you insert a new record that is not at the end of the index then SQL Server has to split that page in 2 and physically move half the records o disk to a new page.

However, if you set your FILL FACTOR to be e.g. 90, then this means that 10% of space on the page is reserved for new records meaning that it is a longer time before page splits are needed.

The trade-off of this is that when doing a large range scan you will have to read a few more pages to get all the records that you need.

You need to find the balance for your system between many page splits and more reads for an index scan - or look at redesigning your clustered index definition so that all new records are added to the end.

August 3rd, 2015 12:34am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics