Creating Parallelism by Executing Many dtexecs

I'm currently looking at refactoring an existing, large SSIS 2012 implementation that consists of about 55 projects and 360+ packages. The ETL framework that is in use has a "main" control package that reads from a database table and determines which packages are ready to execute (based on some dependency logic) and then uses an Execute Process task within a loop that calls dtexec with the arguments: 

/C start Dtexec /SQL "SomePackagePath" /SERVER "someserver" 

This design allows the loop to execute a package and then immediately iterate because it doesn't wait for the package to respond (aka complete with a failure or success) so it can quickly kick off as many packages are ready to execute. A SQL Agent job is used to call this package every few minutes so that it can pick up any packages that have had their dependencies satisfied since the last execution and kick those off.

It's a a clever design but has some problems such as decentralized exception handling (since the parent package is unaware of what is happening in the "asynchronous" dtexec calls.

My biggest concern is that by executing packages, not with the Execute Package Task but with the Execute Process Task, and spinning up many dtexecs, the framework is not leveraging SSIS's ability to handle threading, memory consumption, etc. across all running packages and executables because it is simply unaware of them. It's essentially like using an Execute Package Task with the ExecuteOutOfProcess property set to true.

I'm wondering how much risk is introduced by this design and if there are any approaches I can take to minimize the risk of excess resource consumption. I apologize for not having an exact question here but would really appreciate any guidance anyone might have.



  • Edited by delish Tuesday, September 01, 2015 3:05 PM
September 1st, 2015 2:53am

Hi David,

All what you said is a valid concern.

I'd expect this process to operate in kind of chaos and be very fragile.

E.g. how to react to errors, or how to guarantee it doesn't make the server run out of memory.

Clearly, a distributed processing was meant but not so elegantly achieved.

I would leverage the execute out of process package SSIS capability out of a single parent package and have it run as part of the Agent job so the execution and errors would be all controlled in one place, that is if you want to continue going the SSIS way, but perhaps you should rather use a different technology suited well for distributed processing.

This is how much I can suggest given the insight depth provided.

Free Windows Admin Tool Kit Click here and download it now
September 1st, 2015 1:27pm

By leveraging the execute out of process capability won't I still be stuck with my greatest concern which is not having SSIS overseeing the distribution of resources? I'm less worried about handling errors. 

Additionally, I think this is less about being a distributed process and more about parallelism. The computing is all being done on one server, not distributed across many. It's a beefy server and should be able to handle a good deal of parallelism. However,  without something overseeing the spawning of these dtexecs, there's nothing to control the resource consumption beyond the OS.


  • Edited by delish Tuesday, September 01, 2015 3:09 PM
September 1st, 2015 3:02pm

As Arthur said, those are all concerns.

I would highly suggest limiting the number of packages you spawn, either per run or more dynamically count the dtexecs currently running and limit how many more you would spawn.  You would have to experiment to find a rational number for your hardware.

I have run as many as 200 simultaneous packages on a very large server.  I don't think I would go much over that number.

Free Windows Admin Tool Kit Click here and download it now
September 1st, 2015 4:20pm

I would recommend you check the commercial COZYROC Parallel Loop Task. It allows you to point to existing "For Each Loop" container and execute multiple loop iterations simultaneously in parallel. It gives you better control on how many iterations you want to execute in parallel and also allows you to gather the logging information and do the exception handling in a regular SSIS way.
September 6th, 2015 1:07pm

I wouldnt be too concerned about SSIS managing its resources because it's not exactly "robust" to begin with.

To demonstrate set up a DFT with a blocking task on one side that's hitting enough data to overflow the buffers and you'll still crash with 'out of memory' errors (easier to replicate in SSDT/BIDS since you're limited to 32bit & probably in a badly coded non-async custom script component). In my experience there's nothing there that 'pauses' a part of package to wait for resources to be released before resuming.. The component/task randomly selected to execute at a particular time will attempt to do its job whether or not there's enough resources to do so

Set an upper threshold for concurrent tasks based on how 'heavy' your SSIS packages are. If all they're doing is firing off DML SQL and not actually flowing any data through them then you can probably spawn quite a few threads.

Free Windows Admin Tool Kit Click here and download it now
September 7th, 2015 12:18am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics