Trying to Develop my first ETL
So Im trying to learn a bit about BI and Data Warehousing. I have an operational database that has information that I use on a daily basis. And I am trying to create an ETL that goes from that DB to an ODS DB. I have all my dimension and fact tables set up and I plan on using SSIS to write the ETL. But I dont know how to transform the data into what I need. for example.... I have a row in my DB with a timestamp. I have a table in my ODS called DimTime with seconds through years as columns. How in the world do I convert that timestamp to a row in that table?
November 29th, 2010 9:56pm

Skaz, You can either use sql expressions or SSIS expressions to decompose the datetime values into different columns., but I think you may have a bigger problem that goes beyond of SSIS. It is not recommended to have a date/time dimension a such low level of detail. why? because that is a monster-dimension with way too many rows - you need 31,536,000 for every year worth of rows - If you are using dimensional modeling, you may want to check out some of the kimball's books. For getting your hands around SSIS and data transformation you may want to start reading on the tutorials available in books on line http://technet.microsoft.com/en-us/library/ms169917.aspxHelp Others! Don't forget to mark your thread as answered
Free Windows Admin Tool Kit Click here and download it now
November 29th, 2010 10:22pm

I have a row in my DB with a timestamp. I have a table in my ODS called DimTime with seconds through years as columns. How in the world do I convert that timestamp to a row in that table? As suggested above, you need to use less granularity in time. Example: Day, Month, Quarter, Year. If you have a datetime value, you can map it like: CONVERT(DATE, DatetimeCol) DAY(DatetimeCol) MONTH(DatetimeCol) DATEPART(QQ, DatetimeCol) YEAR(DatetimeCol) Example from AdventureWorksDW: /****** Script for SelectTopNRows command from SSMS ******/ SELECT TOP 1000 [TimeKey] ,[FullDateAlternateKey] ,[DayNumberOfWeek] ,[EnglishDayNameOfWeek] ,[SpanishDayNameOfWeek] ,[FrenchDayNameOfWeek] ,[DayNumberOfMonth] ,[DayNumberOfYear] ,[WeekNumberOfYear] ,[EnglishMonthName] ,[SpanishMonthName] ,[FrenchMonthName] ,[MonthNumberOfYear] ,[CalendarQuarter] ,[CalendarYear] ,[CalendarSemester] ,[FiscalQuarter] ,[FiscalYear] ,[FiscalSemester] FROM [AdventureWorksDW].[dbo].[DimTime] /* TimeKey FullDateAlternateKey DayNumberOfWeek EnglishDayNameOfWeek SpanishDayNameOfWeek FrenchDayNameOfWeek DayNumberOfMonth DayNumberOfYear WeekNumberOfYear EnglishMonthName SpanishMonthName FrenchMonthName MonthNumberOfYear CalendarQuarter CalendarYear CalendarSemester FiscalQuarter FiscalYear FiscalSemester 1 2001-07-01 00:00:00.000 1 Sunday Domingo Dimanche 1 182 27 July Julio Juillet 7 3 2001 2 1 2002 1 2 2001-07-02 00:00:00.000 2 Monday Lunes Lundi 2 183 27 July Julio Juillet 7 3 2001 2 1 2002 1 3 2001-07-03 00:00:00.000 3 Tuesday Martes Mardi 3 184 27 July Julio Juillet 7 3 2001 2 1 2002 1 */ Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
November 30th, 2010 12:59am

Skaz, You can either use sql expressions or SSIS expressions to decompose the datetime values into different columns., but I think you may have a bigger problem that goes beyond of SSIS. It is not recommended to have a date/time dimension a such low level of detail. why? because that is a monster-dimension with way too many rows - you need 31,536,000 for every year worth of rows - If you are using dimensional modeling, you may want to check out some of the kimball's books. For getting your hands around SSIS and data transformation you may want to start reading on the tutorials available in books on line http://technet.microsoft.com/en-us/library/ms169917.aspx Help Others! Don't forget to mark your thread as answered True about the grain. I am just "playing" aroung with SSIS and ETL. I made a relational DB in SQL 2008 and made an ODS DB with 3 dimension tables and one fact table. This is after I tried the tutorial that you provided. That tutorial is why Im asking this question. I have no idea when the lookup and transforation happened, how the other columns were filled in, in the DimTime table.So I made my own DBs and been playing with it but I cant seem to make sense of it all. I have read through a bit of Kimball and Ross's "The Data Warehouse toolkit" so I understand some of the modeling and the general idea of what an ODS needs to be. So with all that being said, creating a simple ETL on my own data is my goal. First target is getting the lookups to work correctly and getting the time piece working....
Free Windows Admin Tool Kit Click here and download it now
November 30th, 2010 1:44am

As suggested above, you need to use less granularity in time. Example: Day, Month, Quarter, Year. If you have a datetime value, you can map it like: CONVERT(DATE, DatetimeCol) DAY(DatetimeCol) MONTH(DatetimeCol) DATEPART(QQ, DatetimeCol) YEAR(DatetimeCol) Example from AdventureWorksDW Ok that helped a lot. So I see that there are built in functions to pull that kind of data from the datetime field. So i would have to emplement that in my ETL somewhere I assume?
November 30th, 2010 2:12am

The following example shows how you can map fact data to dimension table (SQL Server 2008 code): -- Create mapping to dimension table SELECT SalesOrderID, DateKeyID =TimeKey FROM AdventureWorksDW.dbo.DimTime dt INNER JOIN AdventureWorks.Sales.SalesOrderHeader ON CONVERT(DATE, OrderDate)=dt.FullDateAlternateKey /* SalesOrderID DateKeyID .... 75114 1127 75115 1127 75116 1127 75117 1127 ....*/ Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
Free Windows Admin Tool Kit Click here and download it now
November 30th, 2010 2:38am

Thanks for that bit of info. I have another question, where in my ETL would I put the code to transform the date into the table? I have a stupis simple etl right now that takes the tables out of one DB and puts them into another DB minus a whole bunch of useless columns. so its just a series of ole db sources and destinations. Do i need to do it in a lookup? or outside the dataflow?
December 5th, 2010 10:22pm

Thanks for that bit of info. I have another question, where in my ETL would I put the code to transform the date into the table? I have a stupis simple etl right now that takes the tables out of one DB and puts them into another DB minus a whole bunch of useless columns. so its just a series of ole db sources and destinations. Do i need to do it in a lookup? or outside the dataflow?
Free Windows Admin Tool Kit Click here and download it now
December 5th, 2010 10:22pm

ok, I created a trigger to do that and my ETL just imputs the date into the primary key column. First question "Is that a viable way to do it?" second, After i got some conditioning to work with nulls and such, I added a lookp to the destanation table and basically did a " if it is already there, ignore it, otherwise add the row" that seems to work nicely but here is the issue. I am working this ETL on a table that has 10 million rows and grows everyday. If I run my ETL on that, at about 2 million rows I start getting out of memory issues. Is there a way that i can tell it to only process batches of rows at a time?
December 6th, 2010 12:58pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics