Trying to Develop my first ETL
So Im trying to learn a bit about BI and Data Warehousing. I have an operational database that has information that I use on a daily basis. And I am trying to create an ETL that goes from that DB to an ODS DB. I have all my dimension and fact tables set
up and I plan on using SSIS to write the ETL. But I dont know how to transform the data into what I need. for example....
I have a row in my DB with a timestamp. I have a table in my ODS called DimTime with seconds through years as columns. How in the world do I convert that timestamp to a row in that table?
November 29th, 2010 9:56pm
Skaz,
You can either use sql expressions or SSIS expressions to decompose the datetime values into different columns., but I think you may have a bigger problem that goes beyond of SSIS. It is not recommended to have a date/time dimension a such low level
of detail. why? because that is a monster-dimension with way too many rows - you need 31,536,000 for every year worth of rows -
If you are using dimensional modeling, you may want to check out some of the kimball's books.
For getting your hands around SSIS and data transformation you may want to start reading on the tutorials available in books on line
http://technet.microsoft.com/en-us/library/ms169917.aspxHelp Others! Don't forget to mark your thread as answered
Free Windows Admin Tool Kit Click here and download it now
November 29th, 2010 10:22pm
I have a row in my DB with a timestamp. I have a table in my ODS called DimTime with seconds through years as columns. How in the world do I convert that timestamp to a row in that table?
As suggested above, you need to use less granularity in time. Example: Day, Month, Quarter, Year.
If you have a datetime value, you can map it like:
CONVERT(DATE, DatetimeCol)
DAY(DatetimeCol)
MONTH(DatetimeCol)
DATEPART(QQ, DatetimeCol)
YEAR(DatetimeCol)
Example from AdventureWorksDW:
/****** Script for SelectTopNRows command from SSMS ******/
SELECT TOP 1000 [TimeKey]
,[FullDateAlternateKey]
,[DayNumberOfWeek]
,[EnglishDayNameOfWeek]
,[SpanishDayNameOfWeek]
,[FrenchDayNameOfWeek]
,[DayNumberOfMonth]
,[DayNumberOfYear]
,[WeekNumberOfYear]
,[EnglishMonthName]
,[SpanishMonthName]
,[FrenchMonthName]
,[MonthNumberOfYear]
,[CalendarQuarter]
,[CalendarYear]
,[CalendarSemester]
,[FiscalQuarter]
,[FiscalYear]
,[FiscalSemester]
FROM [AdventureWorksDW].[dbo].[DimTime]
/*
TimeKey FullDateAlternateKey DayNumberOfWeek EnglishDayNameOfWeek SpanishDayNameOfWeek FrenchDayNameOfWeek DayNumberOfMonth DayNumberOfYear WeekNumberOfYear EnglishMonthName SpanishMonthName FrenchMonthName MonthNumberOfYear CalendarQuarter CalendarYear CalendarSemester FiscalQuarter FiscalYear FiscalSemester
1 2001-07-01 00:00:00.000 1 Sunday Domingo Dimanche 1 182 27 July Julio Juillet 7 3 2001 2 1 2002 1
2 2001-07-02 00:00:00.000 2 Monday Lunes Lundi 2 183 27 July Julio Juillet 7 3 2001 2 1 2002 1
3 2001-07-03 00:00:00.000 3 Tuesday Martes Mardi 3 184 27 July Julio Juillet 7 3 2001 2 1 2002 1
*/
Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
November 30th, 2010 12:59am
Skaz,
You can either use sql expressions or SSIS expressions to decompose the datetime values into different columns., but I think you may have a bigger problem that goes beyond of SSIS. It is not recommended to have a date/time dimension a such low level
of detail. why? because that is a monster-dimension with way too many rows - you need 31,536,000 for every year worth of rows -
If you are using dimensional modeling, you may want to check out some of the kimball's books.
For getting your hands around SSIS and data transformation you may want to start reading on the tutorials available in books on line
http://technet.microsoft.com/en-us/library/ms169917.aspx
Help Others! Don't forget to mark your thread as answered
True about the grain. I am just "playing" aroung with SSIS and ETL. I made a relational DB in SQL 2008 and made an ODS DB with 3 dimension tables and one fact table. This is after I tried the tutorial that you provided. That tutorial is why Im asking this question.
I have no idea when the lookup and transforation happened, how the other columns were filled in, in the DimTime table.So I made my own DBs and been playing with it but I cant seem to make sense of it all.
I have read through a bit of Kimball and Ross's "The Data Warehouse toolkit" so I understand some of the modeling and the general idea of what an ODS needs to be.
So with all that being said, creating a simple ETL on my own data is my goal. First target is getting the lookups to work correctly and getting the time piece working....
Free Windows Admin Tool Kit Click here and download it now
November 30th, 2010 1:44am
As suggested above, you need to use less granularity in time. Example: Day, Month, Quarter, Year.
If you have a datetime value, you can map it like:
CONVERT(DATE, DatetimeCol)
DAY(DatetimeCol)
MONTH(DatetimeCol)
DATEPART(QQ, DatetimeCol)
YEAR(DatetimeCol)
Example from AdventureWorksDW
Ok that helped a lot. So I see that there are built in functions to pull that kind of data from the datetime field. So i would have to emplement that in my ETL somewhere I assume?
November 30th, 2010 2:12am
The following example shows how you can map fact data to dimension table (SQL Server 2008 code):
-- Create mapping to dimension table
SELECT SalesOrderID, DateKeyID =TimeKey
FROM AdventureWorksDW.dbo.DimTime dt
INNER JOIN AdventureWorks.Sales.SalesOrderHeader
ON CONVERT(DATE, OrderDate)=dt.FullDateAlternateKey
/*
SalesOrderID DateKeyID
....
75114 1127
75115 1127
75116 1127
75117 1127
....*/
Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
Free Windows Admin Tool Kit Click here and download it now
November 30th, 2010 2:38am
Thanks for that bit of info. I have another question, where in my ETL would I put the code to transform the date into the table? I have a stupis simple etl right now that takes the tables out of one DB and puts them into another DB minus a whole bunch
of useless columns. so its just a series of ole db sources and destinations. Do i need to do it in a lookup? or outside the dataflow?
December 5th, 2010 10:22pm
Thanks for that bit of info. I have another question, where in my ETL would I put the code to transform the date into the table? I have a stupis simple etl right now that takes the tables out of one DB and puts them into another DB minus a whole bunch
of useless columns. so its just a series of ole db sources and destinations. Do i need to do it in a lookup? or outside the dataflow?
Free Windows Admin Tool Kit Click here and download it now
December 5th, 2010 10:22pm
ok, I created a trigger to do that and my ETL just imputs the date into the primary key column. First question "Is that a viable way to do it?"
second, After i got some conditioning to work with nulls and such, I added a lookp to the destanation table and basically did a " if it is already there, ignore it, otherwise add the row" that seems to work nicely but here is the issue. I am working this
ETL on a table that has 10 million rows and grows everyday. If I run my ETL on that, at about 2 million rows I start getting out of memory issues. Is there a way that i can tell it to only process batches of rows at a time?
December 6th, 2010 12:58pm