LookUp ETL Design (Network Steve Forum)

LookUp ETL Design

Hi, I have a sql query which contains sourceid, consumerfirstname, consumerlastname, emailaddress, addressline1, addressline2, city, state, zipcode I need to populate AddressInstance table upon looking up key values in two different tables ( consumer and address ). The addressinstance table does contain foriegnkeys referencing the consumer and address tables. I have already populated the consumer table and the address table . I would want to know how can I better design my package to populate the addressinstance table . I tried with the following approach but it is NOT efficient at all. Takes very long time: 1. I have my source query as oledb source in the data flow task . The source query contains ( sourceid, consumerfirstname, consumerlastname,emailaddress, addressline1, addressline2, city, state, zipcode ) 2. I used the Multicast transformation to use 2 look ups on the same query: The first lookup to get the consumerkey from the consumer table based on the consumerfirstname, consumerlastname, sourceid, emailaddress. The second lookup to get the address key from the address table based on the addressline1, addressline2, city, state, zipcode match. 3. In this way the 2 lookups work in parallel. I am not sure if this is the right approach as it took a massive amount of time for 130000 records. Please suggest a better approach to design the package so that I can populate the addressintance table ( consumerkey and addresskey - looking up key values in consumer and address table. Thanks EVA05

August 14th, 2012 10:29am

Hi, You can put them in sequence and then see if you get any better performance. The records you are getting from source are not too many. Which mode of Lookup you are using ( full cache, partial etc) how many records are in each of the lookup table? What is memory size (RAM) of machine where you are running the pacakge? If Source data and lookup tables are on the same server then you can use join instead of using lookup. Join all three tables , use that query in oledbd source and insert into destination. So you will be handling everything on DAtabase Engine side. Thank youhttp://sqlage.blogspot.com/

Free Windows Admin Tool Kit Click here and download it now

August 14th, 2012 10:41am

There is no silver bullet for your scenario. Performance of a pattern varies from scenario to scenario. Lookup transform is considered to perform far better than other component (Refer here) but in some cases the Merge-Join, Conditional Split pattern is found to perform better (Refer here). So you need to first check if the Merge-Join, CSPL pattern works better for you. Also make sure you are looking at the performance timings of the lookup component and not the whole data flow component as there maybe different causes of the slowness experienced.http://btsbee.wordpress.com/

August 14th, 2012 10:49am

hi Eva05, I agree with btsbee, there is no best solution, but there is always a better solution depends on your case. Aamir also pointed out good points like RAM, Number of LookUps and Type of LookUp etc. which you should consider for the performance. Let me ask you once thing, all the tables you mentioned ... are these on different servers or just on 1 server ?? If on 1 server then best is to write a T-SQL. Do all your joins to find out your ids and create a final result set. If not then you have to consider all these factors like RAM, LookUps, MergeJoins etc. One thing i noticed, why do you need to use Multicast. you can use 1 LookUp and then another LookUp on the top of it and continue the good flow upto destination. Multicast will consume more RAM + your LookUps too. You can also check few SSIS Performance Tutorials: http://technet.microsoft.com/en-us/library/cc966529.aspx http://sqlcat.com/sqlcat/b/presentations/archive/2009/05/02/designing-and-tuning-for-performance-your-ssis-packages-in-the-enterprise-sql-video-series.aspx Thanks, Khilit http://www.bigator.com

Free Windows Admin Tool Kit Click here and download it now

August 14th, 2012 11:14am

Hi, I am facing a unique problem with using Lookup with full cache. Although my reference table has the data to match - the lookup fails to match to all the reference fields. I have over 130000 records. Out of there there are 500 records which do not match to the values in the reference table- despite the value being present in the source query. I verified this by re-directing no match rows to text file and comparing the matching fields to ones present in the look up reference table. When I used lookup with partial cache and no cache there were NO problems. all the rows matched. I cannot use partial cache or NO cache as the performance of my package is severely affected. Please guide as to what checks I need to make in order to ensure all the required data is matched from source query to reference table. Again the data is present just dont know why it is failing to identify in full cache mode ? Please advice ... ThanksEVA05

August 14th, 2012 3:27pm

The lookup is case sensitive, is that the issue?Arthur My Blog

Free Windows Admin Tool Kit Click here and download it now

August 14th, 2012 4:10pm

yes. case sensitive was the problem.EVA05

August 14th, 2012 4:47pm

This topic is archived. No further replies will be accepted.