Lookup Perfromance issue
Hi All,
I am facing the perfromance issue in the data flow task and looking for your help on this. I am using SSIS package (2008) to move the transactional data from one server to another (source is sql server 2005 and destination is 2008). I am using the dataflow
task and in this I am using a Lookup to identify the new records. Average about the 1 million records we are pushing from source to destination everyday. The data flow task is consuming a lot of time (average 10-15 mins) to do this tranformation.
I believe this is because this is a OLEDB connectivity and it is a cursor based operation.
I was wondering if there is a better way to do this in order to improve the perfromance.
Please let me know.
Thanks, Gaurav
http://gauravsqlserver.blogspot.com
July 29th, 2011 1:10am
I would suggest you to create index on date column (which used to filter data) and pull data filtering yesterday complete data don't use lookups...
some thing like this :-
where createdon between '2011-07-28 00:00:00:000' and '2011-07-28 23:59:59:999' (pull only complete yesterday data this case we don't have to check lookups)
http://uk.linkedin.com/in/ramjaddu
Free Windows Admin Tool Kit Click here and download it now
July 29th, 2011 1:35am
Try to limit the number of columns in your lookup table (don't select the table, but use a query).
And here is an interesting arcticle about lookup cache modes:
http://blogs.msdn.com/b/mattm/archive/2008/10/18/lookup-cache-modes.aspx
A good alternative for large (lookup-)datasets is the merge join (instead of the lookup)
Please mark the post as answered if it answers your question | My SSIS Blog:
http://microsoft-ssis.blogspot.com
July 29th, 2011 2:02am
Hi SSisJoost,
As you mentioned I am not using the table and using t-sql query. I am going to try out the merge join and see this improves the performance.
Thanks, Gaurav
http://gauravsqlserver.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
July 29th, 2011 10:15am
In addition to that I would like to mention, I am using the lookup in the Full Catche mode. The SSIS package is running on a server which has 4 GB ram. Do you think this is enough ram to run the many lookups in Full Catch mode?Thanks, Gaurav
http://gauravsqlserver.blogspot.com
July 29th, 2011 10:22am
In addition to that I would like to mention, I am using the lookup in the Full Catche mode. The SSIS package is running on a server which has 4 GB ram. Do you think this is enough ram to run the many lookups in Full Catch mode?
Thanks, Gaurav
http://gauravsqlserver.blogspot.com
It all depends on the size of the lookup datasets and the use of other heavy components like sort and aggregate. It's a little bit 'trial and error'. If the lookup is too slow try to switch to partial cache or change the lookup in a merge join.
In addition to use less columns:
- also try to limit (where possible) the number of records with a WHERE-clause. Less is more...
- and make sure the result is unique, otherwise SSIS cannot cache the query
- and make sure the rest of your package also meets the best practices. Here are a couple of best practice lists:
http://microsoft-ssis.blogspot.com/2010/12/performance-best-practices.html
http://consultingblogs.emc.com/jamiethomson/archive/2006/01/05/ssis_3a00_-suggested-best-practices-and-naming-conventions.aspx
Let me know the result of your merge join trail...
Please mark the post as answered if it answers your question | My SSIS Blog:
http://microsoft-ssis.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
July 29th, 2011 12:51pm
Hi SSISJoost, I tried the merge join and do't see any perfromance improvement. Tried the partial cache as well and was taking more time. For now I have decided to keep in the same way (Using Lookups) and lookups are fast when I am doing the increamental
load. The perfromace issue I am seeing only when I am doing the full load (larger data set).
Thanks for looking into this.
Thanks, Gaurav
http://gauravsqlserver.blogspot.com
July 30th, 2011 11:54pm
Are you also updating existing (but changed) records?
In that case you can split the lookup into two.
1) only checking if the business key exists
2) and the second for checking changes
More info about that:
http://microsoft-ssis.blogspot.com/2011/01/slowly-changing-dimension-alternatives.html
You could also first load the new data to a staging table on the same server as the destination table and then use a TSQL Merge Statement.
All roads lead to Rome...
Please mark the post as answered if it answers your question | My SSIS Blog:
http://microsoft-ssis.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
July 31st, 2011 4:58am
Hi SSISJoost, I am not doing any update in the historical data and using the lookup only for the inserting the new records.
You could also first load the new data to a staging table on the same server as the destination table and then use a TSQL Merge Statement.
I am doing the same, moving the data from source table to staging table (another server) using the SSIS. This is just table to table transformation, and no any business logic between this.
Thanks, Gaurav
http://gauravsqlserver.blogspot.com
July 31st, 2011 10:21am
Why dont you try, to run the things via a SQL statement wherein you just do a left outer join with the staging table and insert only new values, this should be done via a SP , and then it confirms is it you need to apply some indexes or you need to adjust
some buffer settings on SSISAbhinav
Free Windows Admin Tool Kit Click here and download it now
July 31st, 2011 2:01pm
Hi AB82...I would have done that if the source and staging are in same database. But this is not the case, source and staging are two different sql servers.Thanks, Gaurav
http://gauravsqlserver.blogspot.com
July 31st, 2011 2:05pm
Isn't Linked sever a option? Just need to confirm this is SSIS issues rather then your network issue.
overall when i faced this issues it was SSIS, could get perormance improvements via the Merge joins, and i had around 2 million rows.
Also we indexed the staging tables columns, and the indexes was on the refrenced columns of lookup.
I think you also need to check the indexes on destination table, are you using bulk insert or row by row insert ?
Maybe use a SQL destination instead of OLEDB if your package will reside on the same server as destination
Abhinav
Free Windows Admin Tool Kit Click here and download it now
July 31st, 2011 2:18pm
Hi AB82...this is certainly a SSIS issue, even I tried the SSIS merge join and did't help either. I know I could have done this with in secs if I was doing this in sql, somehow did't lile the OLEDB which is cursor based. I can not use the SQL destination
since the source and staging are two different servers.
I am kind of surprised, in so many years SSIS still do't have any set based connectivity in place of cursor based.
I am going to just play around with the index and see how it works otherwise I might go with the link server approach which is the last thing I wanted to use.
btw..r u based in banglore?Thanks, Gaurav
http://gauravsqlserver.blogspot.com
July 31st, 2011 2:38pm
Ok thats good, that means you r using row by row insert and not Bulk Insert, this might indicate some problem with you destination DB, wherein it takes time to insert, have you already had a look where does the package stops pre Execute task( this would
imply problem with Look up), or somewhere else
and yes i was based in Bangalore :) but not these days, how bt you
Abhinav
Free Windows Admin Tool Kit Click here and download it now
July 31st, 2011 2:58pm
I am going to play around with the indexes and this might improve some perfromance. I will let you know the results.
I am based in hyderabad but in US these days on a short visit.Thanks, Gaurav
http://gauravsqlserver.blogspot.com
July 31st, 2011 3:05pm