Large table Primary Key clustered vs non-clustered - MERGE statement concerns

I have a very large table that is used as a staging table of sorts. It contains about 50 million rows. It has a 6 column composite key which includes 2 INT columns, 3 varchar(up to 24) columns, and a DECIMAL(24,0) columns for a unique constraint. It may not be the best, but that is what it is. Someone removed the PRIMARY KEY, but that is what it was. One of the key columns is an integer equivalent of a DATE called DayKey, stored like 20150430.

This table is populated daily by an import table, which has about 50K-100K rows in it. The tables are about 40 columns. The daily imports come in with an incremented DayKey, so tomorrow's would be 20150501. These rows later will be brought into a dimension table.

So, the import table, truncated daily, has that day's rows in it, with the appropriate DayKey. The STAGE table appends these on a daily basis, with appropriate changes or additions.

My question is this: I am now using a MERGE statement to populate this and it is timing out. I think it is because someone removed the PK. So, in putting one back in, do I add a CLUSTERED PRIMARY KEY, or will that take forever and/or bomb out? I don't want to lose the data. Or do I make it a NON-CLUSTERED?

Since I am inserting thousands of rows every day, it would seem like it would have to rebuild the entire clustered index every time, but I need something because I am MERGING into  it every day and we will need to SELECT from it to populate the DIMENSION table later.

April 30th, 2015 12:36pm

"My question is this: I am now using a MERGE statement to populate this and it is timing out. I think it is because someone removed the PK. So, in putting one back in, do I add a CLUSTERED PRIMARY KEY, or will that take forever and/or bomb out? I don't want to lose the data. Or do I make it a NON-CLUSTERED?"

There's no general answer here.  You need to test, keeping in mind not just this MERGE, but the other operations that touch the table too.

David

Free Windows Admin Tool Kit Click here and download it now
April 30th, 2015 2:07pm

Duan,

My question is this: I am now using a MERGE statement to populate this and it is timing out. I think it is because someone removed the PK. So, in putting one back in, do I add a CLUSTERED PRIMARY KEY, or will that take forever and/or bomb out? I don't want to lose the data. Or do I make it a NON-CLUSTERED?


Primary Key in SQL Server defaults to Clustered index only and you can have only one clustered index per table, may be that is the reason composite primary key is created in the first place

If your choice is to have primary key and you have no choice than having clustered index.


April 30th, 2015 2:17pm

Thank you for the responses. I am just wondering if I can make it a non-clustered, unique key. What would be nice is if I can create a surrogate key column, but I am not sure how I could create a key that I could reference with the daily truncated import table. It would have new records every day while the stage table has all the records from every day.

This is probably a subject for a new thread...

Free Windows Admin Tool Kit Click here and download it now
April 30th, 2015 2:39pm

Without being able to touch and feel your database, whatever I am going to say is based on the limited information that you provided, and so it may not be right.

My guess would be that your queries are slower because you don't have the primary key (assuming there are no other indexes). The reason for my thinking is that, for each row in your input table, SQL Server has to examine every other row in your table to see if it needs to update an existing row or insert it as an existing row.  But, if you have a primary key, that task becomes easier for SQL Server.

So my recommendation would be to create CLUSTERED primary key.  How much time and resources that would take? I don't know. Best if you can try it out in a staging/dev environment.  Even if it takes a painful amount of time and resources, I would lean towards doing that.

When you do create the primary key, and assuming the date column is the first column in your composite key, the merge statements should run much faster.  But I have to qualify that because I am basing that on the assumption that each day's data is mostly new data or data for recent days. 


  • Edited by JamesKJ 12 hours 14 minutes ago
April 30th, 2015 2:51pm

Venugopal, I hope I am misinterpreting what you are saying. You are not saying that if you want to have a primary key, it has to be clustered, are you?  You can have a non-clustered primary key.
Free Windows Admin Tool Kit Click here and download it now
April 30th, 2015 2:53pm

A quick general question, just a "order of magnitude" question: I know there are many, many variables. But if I add a 6 column primary key to this huge table (heap) with 40 columns and 50 million rows, roughly how long will it take? I assume it would have to rebuild the whole table. On SQL Server 2008 R2 Enterprise edition, would you be talking seconds, minutes, hours, days, or just bomb outright?
April 30th, 2015 3:17pm

No what I meant is having unique column combination clustered is the best choice in his case. Having said that I agree this can be better advised only by understanding the whole scenario. 
Free Windows Admin Tool Kit Click here and download it now
April 30th, 2015 4:22pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics