Question - SSIS package sequencing and scheduling dilemma
Hi, Background: I have 4 SSIS (Visual Studio 2008 Professional) packages. Packages 1-3 are what I call "file builders": they gather data from three seperate sources and then create flat files (all follow the same flat file format) that are then dumped into a central folder location, called Processing. Files can range in size from 1KB to 2GB, thus processing can take millseconds to minutes. The 4th package, I call the "file consumer" processes each file (1 only at a time) from the Processing folder through several layers of validation. The data structures supporting this system only handle 1 file at a time due to the downstream processing db architecture. If a file completes the validation process, it is moved to the Archive folder, whereas if a file fails validation it is moved to the Errors folder. A successfully validated file's data is then sent downstream to another process for balancing and auditing. Each package is associated with a like numbered job in SQL server job agent (2008 R2). Each job is scheduled on a five minute interval. Questions: Jobs 1-3 (file builders) run fine and without overlap. Job 4 (file consumer) does not run well on the job agent on the schedule it is currently on. While I can manually run this package beautifully, on the job agent it produces errors. So my question is two-fold: 1. Say if job 4 is executed by the job agent at 3:30pm and the job takes over five minutes to complete processing (past 3:35pm) and there are multiple files in the Processing folder; at 3:35pm, does SSIS/SQL job agent initiate another instance of package 4? If so, can the file that instance 1 is processing suddenly error out due to resource conflict? That is what I believe is happening, but cannot find any support for or proof of. 2. Given the scenario above, what sequence/schedule would be recommended for these jobs, given the client is looking to have very quick response time from the time the data is rolled into a flat file to the time it is consumed and sent downstream and thus back to the client in a balanced format? Thanks, TWBEveryone wants to develop in production, but no one wants development to be production!
January 15th, 2011 8:57pm

Hi Quicker, The job will not get executed again until it either succeeds or fails. If your job begins execution at 2:30a and takes 6 minutes to process a file, and no other files appear the next time it will execute is 2:40a since it missed the 2:35 execution. You can right click on the job and choose "History" to see its execution times and results. I think the way you are doing this is ok but you might want to consider integrating the validation into one of the other packages and handle it on a per-file basis instead of each-file OR looking into other means of making package #4 more efficient? What kind of validation are you doing exactly? The looping mechanism might be seeing additional files in the queue with locks, which is probably where your errors are being produced. Jon
Free Windows Admin Tool Kit Click here and download it now
January 15th, 2011 11:56pm

Thanks Jon, another quick question: Say when the package fires off at 2:30a, does it just process the files in the folder at 2:30a or while processing other files are added, will it process them as well? Thanks, TerrenceEveryone wants to develop in production, but no one wants development to be production!
January 16th, 2011 10:50am

Hey Terrence, I couldn't find any information about this on the web so built my own package to test. It appears only files that are in the specified folder at execution time are processed. Files added after 2:30a in your scenario would need to be picked up by the next job. Hope this helps! Jon
Free Windows Admin Tool Kit Click here and download it now
January 16th, 2011 12:13pm

The specific details are that the package will process files in the folder that it collects at the start of the foreach loop. A better architecture for your system would be to use a control table in your database. Your first package would detect source files and add them to the table. Your subsequent packages would not monitor folders, but would monitorfile status in that table, and update file processing status there. That way, you can avoid file locking problems completely, as well as leave your architecture open to being scaled up and out to multiple instances of your file processing packages. Talk to me now on
January 16th, 2011 1:05pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics