How to alter partition function in Data Vault where there is a 4 table dependency (350 mil rows each table)
Hi all,
 I'm currently stuck with a table that has 350 mil records. Querying this table is insanely slow so I had a better look at existing yearly partitioning. I already managed to partition on a month level which increased the performance/querrying a lot.
I did this on the staging table where I used an alter statement to split the 2015 partition by 12 months.

However, in our project we used Data Vault.
This means that we have 4 tables (hub, sathub, link, satlink), all carrying 350 mil records. The problem is that altering the partition function does not work. The server cannot handle this action. I was wondering what the best way is to do this, without having to drop/reload all tables.

Thanks for your feedback!
August 28th, 2015 2:03pm

I think a partition function can be used for multiple tables. So you should be able to split the partition which is being used for four tables. Are you using a file group for those four tables or one file group is for one table?
Free Windows Admin Tool Kit Click here and download it now
August 28th, 2015 2:17pm

Hi Guoxiong, 

Yes, it is possible to use an alter partition function on four dependent tables, but it is insanely slow. (Alter takes places on 4 tables of 350 mil rows each). As an example, the transaction log gets full quite fast.

I am using 1 filegroup for each year. Each filegroup consists of 12 files (for each month).

I used these filegroups as partition function that I use on all 4 tables



August 28th, 2015 2:20pm

So you have to do the partition updates online?
Free Windows Admin Tool Kit Click here and download it now
August 28th, 2015 3:30pm

Hi Guoxiong, 

Yes, it is possible to use an alter partition function on four dependent tables, but it is insanely slow. (Alter takes places on 4 tables of 350 mil rows each). As an example, the transaction log gets full quite fast.

I am using 1 filegroup for each year. Each filegroup consists of 12 files (for each month).

I used these filegroups as partition function that I use on all 4 tables



  • Edited by Yvanlathem Friday, August 28, 2015 6:33 PM
August 28th, 2015 6:19pm

Hi Guoxiong, 

Yes, it is possible to use an alter partition function on four dependent tables, but it is insanely slow. (Alter takes places on 4 tables of 350 mil rows each). As an example, the transaction log gets full quite fast.

I am using 1 filegroup for each year. Each filegroup consists of 12 files (for each month).

I used these filegroups as partition function that I use on all 4 tables



  • Edited by Yvanlathem Friday, August 28, 2015 6:33 PM
Free Windows Admin Tool Kit Click here and download it now
August 28th, 2015 6:19pm

Yea, the partition is there on a year basis on the tables. But I want to update/alter the function to split and add the months. Problem is that this alt function never completes
August 29th, 2015 4:38am

Yea, the partition is there on a year basis on the tables. But I want to update/alter the function to split and add the months. Problem is that this alt function never completes

Splitting existing non-empty partitions is a very expensive operation; about 4 times logging is needed compared to normal DML. 

A more efficient approach is to create a new partition function and partition scheme with the new boundaries.  Then use CREATE INDEX...WITH(DROP_EXISTING = ON) for each partitioned index, specifying the new partition scheme.

Plan for the future such than only empty partitions are split going forward.  For example, if you create a full set of monthly partitions once per calendar year, create an additional boundary for the first month of following year.  For example:

CREATE PARTITION FUNCTION PF_Monthly(date) AS RANGE RIGHT FOR VALUES();
CREATE PARTITION SCHEME PS_Monthly AS PARTITION PF_Monthly ALL TO ([PRIMARY]);

DECLARE @BoundaryDate date = '20150101';
WHILE @BoundaryDate <= '20151201'
BEGIN
	ALTER PARTITION SCHEME PS_Monthly NEXT USED [FG_2015];
	ALTER PARTITION FUNCTION PF_Monthly() SPLIT RANGE(@BoundaryDate);
	SET @BoundaryDate = DATEADD(month, 1, @BoundaryDate);
END;

ALTER PARTITION SCHEME PS_Monthly NEXT USED [FG_2016];
ALTER PARTITION FUNCTION PF_Monthly() SPLIT RANGE('20160101');
GO

While the last partition is still empty (before the start of the next year), create the partitions for the next 12 months, specifying the desired filegroups.

Free Windows Admin Tool Kit Click here and download it now
August 29th, 2015 11:08am

Hi Dan,

Thanks for your detailed explanation. There is still something I don't understand.

So to be clear, the best way is to create a new partition sheme and function and than alter the index. But how do I change the partitions on the existing tables? How do I change the partitions using the old partition function to the new function?

Or does is doo that automatically when U alter the index to the new scheme/function?

Thanks in advance!

Also, to get into the point of Guoxiong, if I have a function that I use for 4 tables (all 350 mil records each). Is it optimal to use 1 filegroup per year for all tables, or should I use a filegroup for each table?

August 30th, 2015 4:38am

So to be clear, the best way is to create a new partition sheme and function and than alter the index. But how do I change the partitions on the existing tables? How do I change the partitions using the old partition function to the new function?

Or does is doo that automatically when U alter the index to the new scheme/function?

This is done automatically when you rebuild an index specifying the new partition scheme; the index will be repartitioned using the specified scheme and underlying function.  Once you do this for each partitioned index, all of the partitions will be changed to the new function boundaries.

Also, to get into the point of Guoxiong, if I have a function that I use for 4 tables (all 350 mil records each). Is it optimal to use 1 filegroup per year for all tables, or should I use a filegroup for each table?

Filegroups depend on a number of factors.  If you join these tables, it may be advantageous to use one filegroup per table (doesn't necessarily need to be by year too).  This can improve performance of large scans by partition as long the underlying physical storage is on spinning media and isolated.  In contrast, I wouldn't expect much of a performance difference if the underlying physical storage is shared (same spindles) and/or on solid state storage, or if you don't do many large scans.  But it seems scans are common in your workload based on the performance improvement you observed with partitioning.

Besides performance, different filegroups can provide more options for a piecemeal restore.  You could restore more important tables first in a recovery scenario, or restore the more current years first, as long has the application can tolerate partial data availability. 

My general recommendation is to introduce filegroups only as needed to meet performance and RTO SLAs.  Even if a filegroup pattern is commonly used, it may not be appropriate for the specific characteristics of your environment and workload.

Free Windows Admin Tool Kit Click here and download it now
August 30th, 2015 8:51am

Thanks Dan, 

I didn't know that rebuilding the index will automatically change the existing partitioned tables to the new partition function.

Because once in our datamart, we join the link, sat link and hub tables with eachother so I would say that this would improve the initial load of our fact table a lot by having difference filegroups.

Filegroup 2013 HUB with 12 files for each month

Filegroup 2013 LINK with 12 files for each month

Etc ...

August 30th, 2015 9:26am

Because once in our datamart, we join the link, sat link and hub tables with eachother so I would say that this would improve the initial load of our fact table a lot by having difference filegroups.

Filegroup 2013 HUB with 12 files for each month

Filegroup 2013 LINK with 12 files for each month

Etc ...

Be aware that you can directly control data placement only at the filegroup level, not files within a filegroup.  For example, if you have FG_2013_HUB with 12 files, each file will be proportionally filled with data from all months regardless of how the files individual might be named.  You would need a separate filegroup for each table and month (e.g. FG_2013_JAN_HUB, FG_2013_FEB_HUB, etc.), each with at least one file, in order to provide granularity to the table/month level for placement on isolated storage.  My guess is a one filegroup and file per table/year is all you really need.

Free Windows Admin Tool Kit Click here and download it now
August 30th, 2015 9:46am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics