REFRESH DATA WAREHOUSE
Immaking warehouse for our HMIS (healthcare management information system)by using SSIS. Im facing some problems now, could you please help me to solve my problem. Brief idea about my Warehouse: Source: oracle 9i Destination: Sql server 2005 ETL tool: SSIS Problems: How to refresh or load the current data to data warehouse.(now i'm using truncate sql task for deleting old/entire data for each packages, i really dont want to use in the production). For example: The patient admissions data is adding everyday so i want to load the current data into my warehouse. Could you pls suggest me good solution for this? Refresh Cycle timings: is there any task available in SSIS? current status: First Time load completed, i set one Execute Sql statement ctrl flow task for Truncate the existing loaded data in the sql server 2005. and then again i process one data flow task for loading the data from oracle to sql server.
May 21st, 2007 3:27pm

There are many different tools and techniques in SSIS that can be applied to this problem, but the real question to ask is how do you identify new and changed data in the source system? If you can reliably identify new records in the HMIS system, I would recommend an approach similar to this: For each package execution, store the ID of the "greatest" (most recent) record being processed from HMIS. You can store it in a flat file, in the staging database (my personal preference) or wherever, as long as it is reliable. Update your source queries to only SELECT records "greater" (more recent) than the last processed record. This will ensure that whatever data you're pulling from HMIS is data that you actually need, and you're not pulling any unneeded data. Assuming that you cannot reliably identify new records, you can use the Lookup transform in your data flow, to see if a given record already exists, and only insert it into the target table if it does not. This keeps you from needing to truncate your target tables each time, but still incurrs the overhead of pulling all data from the source system. What neither of these techniques addresses is modified data - what do you do if a record in the source system changes? This is a much larger issue in general, and the answer generally relies as much on the business context of the data as it does on the ETL toolset - what should you do, as opposed how do you do it? Also, I'm not sure what you mean by "Refresh Cycle timings: is there any task available in SSIS?" Can you please rephrase or elaborate on this question? Thanks!
Free Windows Admin Tool Kit Click here and download it now
May 21st, 2007 4:14pm

Thank you very much Mr. Roche. i'm going to try lookup transformation task for this problem. Refreshing cycle i mean the data warehouse load sheduding. i'll inform you once i solve this problem. regards, abi i'm attaching one input from Mr.Andy, he sent it to my official id: Hi Abi, Incremental loads are the same regardless of which database platform or ETL toolyou use. I can best demonstrate it with T-SQL: 1. Assume a source named tblSource with the columns ColID, ColA, ColB, and ColC. Assume also ColID is a primary unique key. 2. Assume a Destination table named tblDest with the columns ColID, ColA, ColB, ColC. 3. An incremental load would take place in a couple steps - assuming no rows are deleted at the source; rows from the source only change or are added. (Make sense?) 4. New rows can be loaded using: SELECT s.ColID, s.ColA, s.ColB, s.ColC FROM tblSource s LEFT JOIN tblDest d ON d.ColID = s.ColID WHERE d.ColID IS NULL 5. There are many ways by which people try to isolate changed rows. The only sure-fire way to accomplish it is to compare each field: UPDATE d SET d.ColA = s.ColA ,d.ColB = s.ColB ,d.ColC = s.ColC FROM tblDest d INNER JOIN tblSource s ON s.ColID = d.ColID WHERE ( (d.ColA != s.ColA) OR (d.ColB != s.ColB) OR (d.ColC != s.ColC) ) You can accomplish this in SSIS using the Merge Join or the Lookup (for the join functionality)combined with the Conditional Split (for the WHERE clause conditions) transformations. I'm not sure what you mean by "refreshing cycle". My best guess is you want to schedule the load to run automatically every day (or night). You accomplish that best with a SQL Agent job. Hope this helps, Andy
May 22nd, 2007 7:38am

Abi Antony wrote: Thank you very much Mr. Roche. i'm going to try lookup transformation task for this problem. Refreshing cycle i mean the data warehouse load sheduding. i'll inform you once i solve this problem. regards, abi Andy is exactly right - SSIS doesn't have any built-in scheduling tool. The most common way to schedule the execution of your SSIS packages is to use SQL Server Agent, but any scheduling utility should do the trick.
Free Windows Admin Tool Kit Click here and download it now
May 22nd, 2007 3:05pm

Hi Abi, Thanks for posting my email response! For those interested, I've (finally) posted about Incremental Loads in SSIS. I coverNew and Changedrow detection, not Deleted rows.Istart with Transact-SQL as an example. I include a link to the T-SQL and SSIS projects at the end of the article, although free registration at VSTeamSystemCentral.com is required. Hope this helps,Andy
May 22nd, 2007 4:24pm

Thanks Andy, I've come across that article in your SQL deep dives book, however, just wondering if there's any similar resources out there that will cover deleted rows from the source and then apply the delete to the destination? Thanks again.
Free Windows Admin Tool Kit Click here and download it now
October 26th, 2010 10:08pm

What I've currently Done is just created a new data flow, based on the same logic, yet just change the destination to lookup the source for any deleted rows, and then create an OLEDB command to delete the rows in the destination table (that were deleted from source). Is that the only way to achieve this ?
October 27th, 2010 12:00am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics