Adding Dimension entries via an ETL Process
Hi Folks, I am very new to SSIS and ETL, and am trying to use it to build processes for some daily data processing tasks. I have started off by going through the basic msdn tutorial: http://msdn.microsoft.com/en-us/library/ms169917.aspx and I have found that this tutorial exhibits most of the features I want in my real-world transform, so its a good reference point. However I have hit a stumbling block with something and I hope someone out there will be able to put me on the right path. I am using SQL Server 2008 R2. To explain my requirement I would like to extend the example in the tutorial as context is very close to what I want to do in my real project and is something that many of you will be familiar with. Problem Basically, I want to extend the tutorial example so that instead of handling and reporting CurrencyID's that dont exist in DimCurrency, I want to be able to create a new DimCurrency entry on the fly, and then populate the Fact Table with the new CurrencyKey generated by this process. Approaches So Far I noticed at first that the 'Lookup Currency' transform in the tutorial uses a 'select' statement and caches the ref data, so I started by changing this to be a table reference and using no caching (because entries in the DimCurrency may be added to during the ETL process so the reference set needs to be dynamic). I then tried to add a OLE DB Command and redirected 'no match' rows from this lookup to the OLEDB transform and used an 'insert' statement in the OLE to add the new DimCurrency. This seems to work, but I don't then know how to either retry the lookup or continue processing down the main DataFlow thread. I also got a duplicate key error, because the no match seemed to repeat despite the new currency being added and despite have 'no caching' set in the lookup. My second idea was to try to use the SCD transform, but whilst trying to set this up in the Wizard, it appeared that to use this Transform I already need to know the DimCurrency CurrencyKey for the new currency, which I don't because the text flie has no knowledge of the internal CurrencyKey (because the text files we are reading don't know about CurrencyKey's in the dB). My final idea was to start the Data Flow off by using an OLE DB transform (i.e. input from text flis going into the OLE DB node first) that calls a stored proc using the CurrencyID from the text file. The stored proc colud then encapsulate the CurrencyKey lookup, and cloud then handle new Currencies if required. The rest of the transfom would then look like the rest of the dalaflow in lesson 5 (i.e. from Lookup Date Key onwards). It struck me at this point that I may well be approaching this in totally the wrong way and that i may need to be thinking differently about these nodes and data flows so I thought maybe it would be a good idea to try and ask more experienced ETLer's out there. Once I can handle new currencies, I will most likely also want to do something similar for the Lookup Date Key step (because I may read in currency data for a NEW date that needs to be added to the DimTime table). I am presuming the approach will be the same for both. Any pointers and advice would be greatly appreciated. .... oh and apologies if this has been answered elsewhere (I did try and look around first ;-). Thanks, BBD.
October 5th, 2011 10:27am

If you load the flat file into a staging table, one that has the same columns as the file, you can then query that table against your dimension to see which currency codes are missing from your dimension. These new codes can then be inserted into your dimension. So, as one possible design... Data Flow Task to load flat file into your staging table Data Flow Task to load new currency codes into the dimension Data Flow Task to load your fact table Dividing the code into separate Data Flows or packages encapsulates and simplifies the tasks. There might be times when you need to pile all of the logic into one Data Flow, but simplification is better as long as performance is not affected drastically.
Free Windows Admin Tool Kit Click here and download it now
October 5th, 2011 11:30am

If you load the flat file into a staging table, one that has the same columns as the file, you can then query that table against your dimension to see which currency codes are missing from your dimension. These new codes can then be inserted into your dimension. So, as one possible design... Data Flow Task to load flat file into your staging table Data Flow Task to load new currency codes into the dimension Data Flow Task to load your fact table Dividing the code into separate Data Flows or packages encapsulates and simplifies the tasks. There might be times when you need to pile all of the logic into one Data Flow, but simplification is better as long as performance is not affected drastically.
October 5th, 2011 6:26pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics