Lookup Perfromance issue
Hi All, I am facing the perfromance issue in the data flow task and looking for your help on this. I am using SSIS package (2008) to move the transactional data from one server to another (source is sql server 2005 and destination is 2008). I am using the dataflow task and in this I am using a Lookup to identify the new records. Average about the 1 million records we are pushing from source to destination everyday. The data flow task is consuming a lot of time (average 10-15 mins) to do this tranformation. I believe this is because this is a OLEDB connectivity and it is a cursor based operation. I was wondering if there is a better way to do this in order to improve the perfromance. Please let me know. Thanks, Gaurav http://gauravsqlserver.blogspot.com
July 29th, 2011 1:10am

I would suggest you to create index on date column (which used to filter data) and pull data filtering yesterday complete data don't use lookups... some thing like this :- where createdon between '2011-07-28 00:00:00:000' and '2011-07-28 23:59:59:999' (pull only complete yesterday data this case we don't have to check lookups) http://uk.linkedin.com/in/ramjaddu
Free Windows Admin Tool Kit Click here and download it now
July 29th, 2011 1:35am

Try to limit the number of columns in your lookup table (don't select the table, but use a query). And here is an interesting arcticle about lookup cache modes: http://blogs.msdn.com/b/mattm/archive/2008/10/18/lookup-cache-modes.aspx A good alternative for large (lookup-)datasets is the merge join (instead of the lookup) Please mark the post as answered if it answers your question | My SSIS Blog: http://microsoft-ssis.blogspot.com
July 29th, 2011 2:02am

Hi SSisJoost, As you mentioned I am not using the table and using t-sql query. I am going to try out the merge join and see this improves the performance. Thanks, Gaurav http://gauravsqlserver.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
July 29th, 2011 10:15am

In addition to that I would like to mention, I am using the lookup in the Full Catche mode. The SSIS package is running on a server which has 4 GB ram. Do you think this is enough ram to run the many lookups in Full Catch mode?Thanks, Gaurav http://gauravsqlserver.blogspot.com
July 29th, 2011 10:22am

In addition to that I would like to mention, I am using the lookup in the Full Catche mode. The SSIS package is running on a server which has 4 GB ram. Do you think this is enough ram to run the many lookups in Full Catch mode? Thanks, Gaurav http://gauravsqlserver.blogspot.com It all depends on the size of the lookup datasets and the use of other heavy components like sort and aggregate. It's a little bit 'trial and error'. If the lookup is too slow try to switch to partial cache or change the lookup in a merge join. In addition to use less columns: - also try to limit (where possible) the number of records with a WHERE-clause. Less is more... - and make sure the result is unique, otherwise SSIS cannot cache the query - and make sure the rest of your package also meets the best practices. Here are a couple of best practice lists: http://microsoft-ssis.blogspot.com/2010/12/performance-best-practices.html http://consultingblogs.emc.com/jamiethomson/archive/2006/01/05/ssis_3a00_-suggested-best-practices-and-naming-conventions.aspx Let me know the result of your merge join trail... Please mark the post as answered if it answers your question | My SSIS Blog: http://microsoft-ssis.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
July 29th, 2011 12:51pm

Hi SSISJoost, I tried the merge join and do't see any perfromance improvement. Tried the partial cache as well and was taking more time. For now I have decided to keep in the same way (Using Lookups) and lookups are fast when I am doing the increamental load. The perfromace issue I am seeing only when I am doing the full load (larger data set). Thanks for looking into this. Thanks, Gaurav http://gauravsqlserver.blogspot.com
July 30th, 2011 11:54pm

Are you also updating existing (but changed) records? In that case you can split the lookup into two. 1) only checking if the business key exists 2) and the second for checking changes More info about that: http://microsoft-ssis.blogspot.com/2011/01/slowly-changing-dimension-alternatives.html You could also first load the new data to a staging table on the same server as the destination table and then use a TSQL Merge Statement. All roads lead to Rome... Please mark the post as answered if it answers your question | My SSIS Blog: http://microsoft-ssis.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
July 31st, 2011 4:58am

Hi SSISJoost, I am not doing any update in the historical data and using the lookup only for the inserting the new records. You could also first load the new data to a staging table on the same server as the destination table and then use a TSQL Merge Statement. I am doing the same, moving the data from source table to staging table (another server) using the SSIS. This is just table to table transformation, and no any business logic between this. Thanks, Gaurav http://gauravsqlserver.blogspot.com
July 31st, 2011 10:21am

Why dont you try, to run the things via a SQL statement wherein you just do a left outer join with the staging table and insert only new values, this should be done via a SP , and then it confirms is it you need to apply some indexes or you need to adjust some buffer settings on SSISAbhinav
Free Windows Admin Tool Kit Click here and download it now
July 31st, 2011 2:01pm

Hi AB82...I would have done that if the source and staging are in same database. But this is not the case, source and staging are two different sql servers.Thanks, Gaurav http://gauravsqlserver.blogspot.com
July 31st, 2011 2:05pm

Isn't Linked sever a option? Just need to confirm this is SSIS issues rather then your network issue. overall when i faced this issues it was SSIS, could get perormance improvements via the Merge joins, and i had around 2 million rows. Also we indexed the staging tables columns, and the indexes was on the refrenced columns of lookup. I think you also need to check the indexes on destination table, are you using bulk insert or row by row insert ? Maybe use a SQL destination instead of OLEDB if your package will reside on the same server as destination Abhinav
Free Windows Admin Tool Kit Click here and download it now
July 31st, 2011 2:18pm

Hi AB82...this is certainly a SSIS issue, even I tried the SSIS merge join and did't help either. I know I could have done this with in secs if I was doing this in sql, somehow did't lile the OLEDB which is cursor based. I can not use the SQL destination since the source and staging are two different servers. I am kind of surprised, in so many years SSIS still do't have any set based connectivity in place of cursor based. I am going to just play around with the index and see how it works otherwise I might go with the link server approach which is the last thing I wanted to use. btw..r u based in banglore?Thanks, Gaurav http://gauravsqlserver.blogspot.com
July 31st, 2011 2:38pm

Ok thats good, that means you r using row by row insert and not Bulk Insert, this might indicate some problem with you destination DB, wherein it takes time to insert, have you already had a look where does the package stops pre Execute task( this would imply problem with Look up), or somewhere else and yes i was based in Bangalore :) but not these days, how bt you Abhinav
Free Windows Admin Tool Kit Click here and download it now
July 31st, 2011 2:58pm

I am going to play around with the indexes and this might improve some perfromance. I will let you know the results. I am based in hyderabad but in US these days on a short visit.Thanks, Gaurav http://gauravsqlserver.blogspot.com
July 31st, 2011 3:05pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics