Dimension question
Hi All. I am working on creating a customer dimension in our DW(the DW is in the initial stage of develpoment). I have a couple of general quesitons and I'd appreciate if you guys can help. 1. I am not sure if I should use SSIS for this because all the columns in the dimension are going to be of Type1(no history). Should I just create a stored procedure and run it nightly? 2. I am plannning to truncate the table before loading it everynight. Thoughts? 3. This table also stores vendors and there is a flag(IsCustomer) that determines whther its a customor or vendor. Should I create a separate dimension for vendors or have it in the same dimension? I know it depends, but I want to know what it depends on. 4. If you guys suggest using Slowly Changing Dimension wizard, I'll go with that, but I read that its not going to be good if I have a lot of data in my dimension. Thanks in advance for any help on these items.
June 24th, 2011 2:33pm

1 + 4) Use SSIS for that. Don't use the standard SCD, but use one of these alternatives because they are a lot faster! http://microsoft-ssis.blogspot.com/2011/01/slowly-changing-dimension-alternatives.html 2) The fact table(s) or also the dimension tables? Truncating each time could cause a long etl process... Most of the time I only delete and reload the last couple of months in the facttable (in case the source records are updated). Please mark the post as answered if it answers your question | My SSIS Blog: http://microsoft-ssis.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
June 24th, 2011 2:57pm

1. Depends on your constraints and abilities. If you're more comfortable with T-SQL, that might work better for you. If you're wanting maintainability, or if you see more complex transforms in the future, I'd say SSIS (but that's opinion too). 2. Bad idea. Even on a type 1 dimension, there are TWO reasons for never truncating a dimension table: First, you have surrogate keys to think about - you'll be destroying them and likely not recreating them the same way. Second, what happens when your source system decides to delete a row? Your fact table(s) will still refer to that row, won't they? But it won't be in your dimension anymore. 3. You should probably have two dimensions. In the future, you'll have some fact tables that will only refer to vendors, and some only to customers. Unless you have business requirements that request you to "link" customers to vendors that are the "same thing", separate them. 4. Correct. Use a "roll your own" style (lookups, conditional splits, etc...), use T-SQL MERGE, or the Dimension Merge SCD. Talk to me now on
June 24th, 2011 2:59pm

Thank you so much for your thoughts! If I use SCD, what will it do if the source system deletes a row? While getting data from the source system into the staging tables, I should truncate the table in staging and reload it correct? Or thats a bad idea as well? Please advise.
Free Windows Admin Tool Kit Click here and download it now
June 24th, 2011 3:45pm

I can't answer that question! Only you or your business can. What happens if the source system deletes a customer? My first response is that the Data Warehouse should never delete it - but perhaps it should mark it as "expired". But then... isn't that an SCD2 dimension? Yep. IMO - there are no such things as "pure SCD1" dimensions... except a Date dimension :) It's perfectly OK to truncate and reload staging tables. In fact, you should design your ETL such that it will be able to completely recreate the staging area from scratch. Any "staging" tables, etc... should be "volatile". You shouldn't need to back up the staging area at all. As Kimball compared Data Warehousing to a "restaurant" - the ETL/staging area is the kitchen. The raw data is the unprepared food. As long as you keep the "machinery" there (knives, counters, mixers, ovens), you should be able to hose down the kitchen completely for the next day's service. Now, whether you consider staging tables (without data) to be "machinery" (bowls) or part of the transient "food", just be aware of which category you put it in. Take care of (back up) your machinery - but don't worry about the food, you know it's going to "spoil" anyway. Talk to me now on
June 24th, 2011 6:18pm

Thank you once again for your valuable thoughts, highly appreciated! I agree that if a customer was deleted in the source system, we should have it the DW. But my question was, would the SCD in SSIS handle this case? Wouldn't it delete the record if it doesn't find it in the source system?
Free Windows Admin Tool Kit Click here and download it now
June 24th, 2011 7:58pm

First, the SCD Wizard only compares what you feed it. So if you don't feed it a row, it doesn't know that it exists in the dimension table. (I think that's a problem.) Second, the SCD Wizard in SSIS doesn't actually "do" anything. It simply alters the contents of the rows being passed through it (which are in memory - not in your table), and then those changes get persisted to the database by OLE DB Commands and OLE DB Destinations. So for two reasons - no, it won't delete anything (because it doesn't actually do anything, and because it doesn't delete, period). Talk to me now on
June 27th, 2011 7:57pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics