Perforance Mystery
Hello SSIS Forum. I have been working on a performance problem for a few weeks now. Background: I am using BIDS 2005 and SQL Server 2005 (client requirement - no option to upgrade). In the past I built an ETL utility in MS Access 2003 (basically VB and SQL app which is "contained" in an Access "application". I am now replacing this Access based ETL with SSIS. The mission of this SSIS package is to connect to a series of Sybase ASE 15 databases (source data) iteratively and ETL data into a SQL Server 2005 reporting database. Overall my new SSIS package does a good job at replacing it's predecessor (the Access based ETL). As I would expect, SSIS is much faster than my Access based version (on my development machine anyway). The SSIS version completes it's job in about 5min 20sec when run on my development box. On the development box there is no "network" in that all parts of the solution are local except that instead of Sybase ASE 15 I have SQL Anywhere (now iAnywhere) emulating ASE 15. In the past 6 years of working with Sybase and SQL Anywhere I have found the latter a good development surrogate for the former though I note they are at times different. The problem I am having is that when my package is deployed to the customer's test environment performance is much slower. Run time is about 40min which, in and of itself is not a killer but I have since added the final data flows and performance has dropped to 18hrs run time!!!! which is totally unacceptable. Over the past 2 weeks I have done all sorts of troubleshooting. Here are some key notes: The customer servers are all VMs and are all hosted by the same VM host so there is in theory no "network" between the Sybase VM and the SQL Server/SSIS VM much as there is no network between the servers on my single development machine. The VMs are almost identical in performance spec to my dev box. My dev box is actually also a VM (Windows XP VM running in Fusion on MacBook Pro) and my dev environment is lightning fast (IMHO) no problem there... I recently isolated the major part of the 18hrs run time to the two new DFTs added to my package. Both of these (one in particular) retrieve very large amounts of data into staging tables in SQL Server. I have running these queries separate from SSIS and they do seem to be the problem. Literally I am retrieving 100,000 + rows of text data each row of which may be a description of up to 65K. The identical query is run from the old Access version of my ETL and the entire Access version runs on the exact same servers in the customer environment in only 38min (vs 18hrs for SSIS) The customer environment is locked down with lots of security policies that I do not have in my dev box but if that is the culprit I would expect the Access version to also be slow and it is not. Currently I am thinking that the problem may be in the difference between the ODBC driver used by Access vs the OLE-DB driver used by SSIS. Another possibility is that Access works by having a proxy table to the Sybase source and it simply does an "insert into target table A select from Sybase source table B" where table B is a proxy to the Sybase table. It seems to me that in SSIS all that data is being read into "the pipeline" before being written to the target staging table whereas with Access it is inserting to the target as it reads and perhaps uses up less buffer space of some sort. I am experimenting with emulating this by creating a loop in SSIS to read 5000 rows from Sybase and write them to SQL Server before proceeding with the next 5000, etc.. until done. I also thought about trying to create a proxy table in SQL Server and then using an Execute SQL Task making the approach more similar to what is in the Access version. The DFT in this case is very simple it is literally an OLE DB source and an OLE DB Destination with no slow lookups or merges or anything else inbetween. I know that SSIS packages can be slow when the "check publishers revocation certificate" is checked in internet options and I massively boosted performance on my dev box with this setting ages ago. I also tried this on the customer's servers but it made no difference. I have checked all the obvious things I could think of and would appreciate any other suggestions or ideas... Best Regards, - CharlesCharles
December 12th, 2010 6:53pm

You might need to enable SSIS Logging which helps you identify the exact phases of data transfer taking time. Follow the below suggested articles: http://technet.microsoft.com/en-us/library/cc966530.aspx [SQL Server 2005 Integration Services: A Strategy for Performance ] http://msdn.microsoft.com/en-us/library/bb522550.aspx [http://msdn.microsoft.com/en-us/library/bb522550.aspx] http://technet.microsoft.com/en-us/library/cc966529.aspx [http://technet.microsoft.com/en-us/library/cc966529.aspx] Chaitanya
Free Windows Admin Tool Kit Click here and download it now
December 13th, 2010 1:40am

this is really good article on SSIS Data flow performance, read it carefully : http://www.simple-talk.com/sql/ssis/sql-server-2005-ssis-tuning-the-dataflow-task/http://www.rad.pasfu.com
December 13th, 2010 2:36am

Your first concern should be to measure resources - if the data flow is taking hours to execute, it's likely due to a resource constraint. If you can watch/record perfmon counters during the package execution, start looking at the four basic resources: CPU, RAM, Network, and Disk. Look both at total utilization as well as I/O. You're right that SSIS will pull all the data into memory, where it's possible Access would not. That leads me to think it's the RAM - either there isn't enough to pull the data through the pipeline and it's swapping to disk, or the RAM I/O is a limiting factor in the VM. From those choices, I'd think it's a swapping problem - but you need to watch the counters to get rid of assumptions like that. If it does turn out to be a swapping problem, you either need to increase RAM to that VM, or rearchitect your data flow to reduce memory requirements. Sort transforms are the usual culprit there. Talk to me now on
Free Windows Admin Tool Kit Click here and download it now
December 13th, 2010 11:22pm

Your first concern should be to measure resources - if the data flow is taking hours to execute, it's likely due to a resource constraint. If you can watch/record perfmon counters during the package execution, start looking at the four basic resources: CPU, RAM, Network, and Disk. Look both at total utilization as well as I/O. You're right that SSIS will pull all the data into memory, where it's possible Access would not. That leads me to think it's the RAM - either there isn't enough to pull the data through the pipeline and it's swapping to disk, or the RAM I/O is a limiting factor in the VM. From those choices, I'd think it's a swapping problem - but you need to watch the counters to get rid of assumptions like that. If it does turn out to be a swapping problem, you either need to increase RAM to that VM, or rearchitect your data flow to reduce memory requirements. Sort transforms are the usual culprit there. Talk to me now on Todd / Chiatanya / Reza , Many thanks for the responses. Many of those performance articles I had already read and I have actually already identified the bottle neck as a specific SQL Query. The strange thing is that exact same query (run from SSIS) is super fast when run against a different source database (SQL Anywhere) but is dog slow against Sybase. However that same query is super fast when run against Sybase using ODBC. This is why I think the OLE DB driver to Sybase is somehow part of the problem. The query simply selects two integers and a text column from a single table - that's it. Perhaps not surprisingly the query is much faster and Sybase does not seem to have an issue through OLE DB when that text column is removed from the query. That is fine for testing but moving the text column data over is a requirement for this ETL solution. But the text column does not (typically) contain much data and again simply running the exact same query through Access / ODBC is lightning fast! On the Sybase side I can even run the query through the Sybase Interactive SQL query tool and it is also dog slow there but that may be partly related to running out of client memory as it tries to retrieve all the text to a GUI client. Todd regarding your response "You're right that SSIS will pull all the data into memory, where it's possible Access would not"... since making my post I came to the conclusion I was wrong about that. In a simple DFT with only an OLE DB source and an OLE DB destination and nothing in between, if I run in development / debug mode I can see it read the first 10,000 rows then right them to the destination, then read the next 10,000 etc... It seems that through the maxbufferrows and maxbuffersize properties of the DFT the number of rows and expected buffer can be managed. When I set that to 1000 rows and run I can see it read 100 then write 1000 then read another 1000 etc... So I had to come to the conclusion I was wrong. Are you saying SSIS does read into a big buffer of 100% of the data retrieved before writing anything to the destination? If so that is now how SSIS "looks" to me iin development mode... Thanks, - Charles Charles
December 15th, 2010 1:42pm

No - I'm not saying 100% of the data is read into memory... it depends on what your data flow looks like. If it's just a source and destination, then you'll have very few buffers in memory at one time. If you happened to have a Sort component in between them, then yes - all the data would be held in RAM waiting for the Sort to complete. But that's not your case. If the ODBC driver is much faster - use it. In 2005, I can't recall if that's easy to do or not. I think it wasn't because there wasn't an ADO.Net source... But you could work around that by using a script source that uses the Connection Manager, or just creates an unmanaged one... Talk to me now on
Free Windows Admin Tool Kit Click here and download it now
December 15th, 2010 1:59pm

Todd, thanks for the quick reply. Your comment about using ODBC intrigues me. At the outset of this project I tried many different types of database connections and came to the conclusion that OLE DB was all that really worked with SSIS. I was never able to get .NET or ODBC sources to work. Your comment makes me wonder if I missed the boat on something. Is there some documentation you recommend on using ODBC with SSIS 2005. Honestly I spent weeks on this and came to the conclusion that 2005 was OLE DB centric. - Charles Charles
December 15th, 2010 2:29pm

I came to much the same conclusion early on as well with 2005. I think my solution happened to be to build my own source - but I do believe there's a third-party open source ADO.Net for 2005 on CodePlex... Yes, there is. The ADO.Net source will be able to use the ODBC for ADO.Net driver. Try that out... Talk to me now on
Free Windows Admin Tool Kit Click here and download it now
December 15th, 2010 6:44pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics