ERP Data refresh in Data Warehouse
Guys, Need some expert suggestions on this...we have an ERP system running which is basically used for Project Management and accounting purpose. We have build a data warehouse and a SSAS cube for analysis purpose using this ERP data. The ERP system is very busy during the day time and at present the total transactional DB size is about 12 GB (SQL Server 2008). We need to develop a mechanism to get the incremental and updated data every hour from Transactional DB and refresh the data warehouse and cube. Please let me know what would be the best possible approach to complete this requierement. Thanks for your help on this. Thanks, Gaurav http://gauravsqlserver.blogspot.com
April 11th, 2011 4:44am

You can achieve this by using CDC feature in SQL 2008 Change Data Capture is a new feature in SQL Server 2008 that records insert, update and delete activity in SQL Server tables. A good example of how this feature can be used is in performing periodic updates to a data warehouse. The requirement for the extract, transform, and load (ETL) process is to update the data warehouse with any data that has changed in the source systems since the last time the ETL process was run. Before CDC we might simply query a last updated DATETIME column in our source system tables to determine what rows have changed. While this is simple and pretty effective, it is of no use in determining any rows that were physically deleted. In addition we can't determine what was changed when; we can only access the current state of a row that has changed. CDC provides a configurable solution that addresses these requirements and more. for more help please visit here http://www.mssqltips.com/tip.asp?tip=1474http://uk.linkedin.com/in/ramjaddu
Free Windows Admin Tool Kit Click here and download it now
April 11th, 2011 4:54am

Hi Gaurav, for your requirement, you have to use Lookup transformation on your data warehouse for identifying new records...(best try to use SQL query in the lookup component) and while you are getting the data from your transactional DB, use OLEDB source with SQL Query for filtering your data (if you have time column in your transactional table) Let us TRY this | My Blog :: http://quest4gen.blogspot.com/
April 11th, 2011 4:56am

First of all you need to have some mechanism in the source to identify the increment (aka the newly inserted records, the updated records and possibly the deleted records). The easiest solution is CDC, as RamJaddu mentioned. If that is not an option, you can use audit columns (creation_time and update_time. These are the time columns ETL vs ETL mentioned). Once you've identified the records the records belonging to the most recent increment, you can read them into the SSIS dataflow with an OLE DB Source. If possible, you can do some joining, formatting and sorting in the SQL query. Then you must identify which record is an update or an insert. If you have CDC, this is easy, as the system will identify that for you. If not, then you must do it yourself using either joins or the LOOKUP component. Be aware that the lookup component isn't always that efficient. Insert the new records into the destination table using the OLE DB Destination with the Fast Load option. Write updates to a staging table, so that you can do a set-based update with an Execute SQL Task. Optionally, you can check if an update is a "real" update, meaning that some field has actually changed value. You can use a checksum for that. If you have questions about this approach, don't hesitate to ask!MCTS, MCITP - Please mark posts as answered where appropriate.
Free Windows Admin Tool Kit Click here and download it now
April 11th, 2011 8:24am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics