Steps to Implement daily incremental load using SSIS
Hi, I have a source temprory database which I have to load in the target Staging database daily. In the source table there is no date columns. In the target table I have to add 1 date column extra which will contain date on which data is being loaded. Like if the data is being loaded today then that date column will contain todays date. All these loading I have to incrementally everyday. I will try to explain the scenario. Today I am loading the data from the source which has 100 rows into the target tables. All the rows will be inserted with the date column of todays date for the 100 rows. Next day when I load from the source which has now 120 rows only 20 rows will be inserted into the target with the date column having Next day's date. What is the best approch to implement this using SSIS keeping in mind performance also. Thanks
December 11th, 2010 3:28am

you can use a lookup transform to find out if rows in source table exists in destination or not, if not use non-match output and connect it to destination. BUT THIS IS NOT BEST APPROACH. this will look into all rows in destination table and this will take time, also all rows from source will load into memory and this will take much resource. I prefer this approach: add a ImportDate column to SOURCE table , and in the Source, you can use select * from sourcetable where ImportDate is null but this needs that you write Import date into this column when you import data to destination, so use a MultiCast Transformation before destination, and connect a copy of data stream to an OLEDB Command and write an update command there to update ImportDate column based on current date http://www.rad.pasfu.com
Free Windows Admin Tool Kit Click here and download it now
December 11th, 2010 3:55am

Are there any key columns in source which can identify if the source data is present in your staging data? If your source has a primary key then you might be able to quickly make a list of keys which to load and which to ignore and then you can select data with a NOT EXISTS clause from your source.
December 11th, 2010 4:29pm

Today I am loading the data from the source which has 100 rows into the target tables. All the rows will be inserted with the date column of todays date for the 100 rows. Next day when I load from the source which has now 120 rows only 20 rows will be inserted into the target with the date column having Next day's date. Do you have a datetime column to indentify the new/old records as Ayush asked? If yes, this can be simply solved by a T-SQL query (Best Practice). You might track the last modified datetime value or last identifier for the rows that you’ve pulled. Then, the next time you extract from the same source system, you start where you left off, rather than extracting the entire source data set. If you don’t have such column, you can try to add one to the source table: ALTER TABLE Sales.Customer ADD CustomerRowVersion rowversion NOT NULL If you cannot add such column to the source table, you can also implement CDC (only 2008 and later version) on your source table: http://msdn.microsoft.com/en-us/library/bb895315.aspx Otherwise, you may need to use MERGE statement or Lookup transform to find out the new/changed rows: http://technet.microsoft.com/en-us/library/cc280522.aspx http://social.msdn.microsoft.com/forums/en-US/sqlintegrationservices/thread/131dbe38-95a9-4b64-8434-60ba3cd6de00/ Generally, comparing to using MERGE statement or Lookup transform, it is fastest to just reload the target table. A rule of thumb is that if the target table has changed by >10%, it is often faster to simply reload than to perform the logic of delta detection. http://sqlcat.com/top10lists/archive/2008/10/01/top-10-sql-server-integration-services-best-practices.aspx Hope this helps, Raymond Raymond Li - MSFT
Free Windows Admin Tool Kit Click here and download it now
December 14th, 2010 2:23am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics