How to force package data flow to stop after importing only X number of rows
Hi Everybody, I'm testing some SSIS 2008 package changes in order to make sure duplicate rows are not being introduced in the future when the Job fails and has to be restarted. Normally, the package gets about 1,000 rows from the source DB and imports into the destination DB. I intend to force the package to stop executing after importing < 1,000 rows, run the package again, and then check my destination for duplicates. Unfortunately, the step executes so quickly that I'm having a hard time stopping execution at the right moment. How can I force the data flow to stop [and fail the package] after importing a fixed number of rows but before completing successfully? I assume some type of Event Handler is the way. Thanks, Eric
September 26th, 2012 2:50pm

Why not add a TOP xxx to the source query? SELECT TOP 100 Model, Color, Price FROM dbo.Cars WHERE Color = 'red' Please mark the post as answered if it answers your question | My SSIS Blog: http://microsoft-ssis.blogspot.com | Twitter
Free Windows Admin Tool Kit Click here and download it now
September 26th, 2012 3:46pm

You could also use a Conditional Split to filter rows. You need some kind of row number which you can use to filter. If your source doesn't have a number you could create one with a Script Component or use a custom rownumber component. After that you can use the Conditional Split with a expression like rownumber <= 100Please mark the post as answered if it answers your question | My SSIS Blog: http://microsoft-ssis.blogspot.com | Twitter
September 26th, 2012 4:17pm

SSISJoost, Thanks for the response. Modifying the query would definitely work, but if my query is in the form of a stored procedure that I don't have permissions to alter, how could I set this limit in the package itself? -Eric
Free Windows Admin Tool Kit Click here and download it now
September 26th, 2012 4:17pm

You could also use a Conditional Split to filter rows. You need some kind of row number which you can use to filter. If your source doesn't have a number you could create one with a Script Component or use a custom rownumber component. After that you can use the Conditional Split with a expression like rownumber <= 100Please mark the post as answered if it answers your question | My SSIS Blog: http://microsoft-ssis.blogspot.com | Twitter
September 26th, 2012 4:23pm

If your intention is just to prevent duplicate rows to enter your destination, then you should try a lookup - update on match mode or a Merge Statement. Why do you want to stop the package half way through? Is there any other constraint that requires you to stop the job hafway through?Regards, Dinesh
Free Windows Admin Tool Kit Click here and download it now
September 26th, 2012 5:48pm

Dinesh, my intent is to test the package to make sure it doesn't include duplicate rows in the staging table. In order, the package: 1) checks a log table to see if an entry has been made for the day's import. If an import was logged, then package moves on to another table. If no import was logged, then 2) package issues delete statement against staging table for the day's date, then 3) imports the data from source to staging table. Finally, 4) A row is written to the log table for the day's import. If the package ever fails for some reason without writing to the log, the scheduled Job will retry the step, re-executing the package again. It won't find a log entry and will re-import the rows after clearing out any pre-existing rows. I don't want to change the overall structure of the package flow, I just want to include something that watches the row counter and stops after 500 rows have been imported (perhaps by throwing an error)? That sounds like an Event Handler to me, perhaps using OnProgress, but I don't know how. :(
September 27th, 2012 11:43am

Hmm, i do understand your requirement but i do not agree with your approach of failing the package to check for duplicates. I have not explored "Raising Cutom Events in SSIS" at all, so i really cannot suggest anything in that area. I would recommend you to make changes to the package and design it in such a way that it handles duplicate entries.Regards, Dinesh
Free Windows Admin Tool Kit Click here and download it now
September 28th, 2012 11:24am

Dinesh, I appreciate your help, but you'd have to read the entire thread and the title to understand. Now how do I delete a thread I started? I must be doing a terrible job of explaining myself because the suggestions lately are going nowhere.
October 3rd, 2012 12:20pm

I think your requirement is very clear and best course of action for your would be 1. You use a Lookup on Stage table and only insert non-match output into Staging table. With this approach, does not matter how many times you run the package in a day, each time only non-duplicate rows will be inserted. 2. You dump all your source data into a temporary table. Use a Join between temporary table and Staging table to filter out rows that do not match and insert it into Staging table. Vikash Kumar Singh || www.singhvikash.in
Free Windows Admin Tool Kit Click here and download it now
October 3rd, 2012 8:50pm

I think your requirement is very clear and best course of action for your would be 1. You use a Lookup on Stage table and only insert non-match output into Staging table. With this approach, does not matter how many times you run the package in a day, each time only non-duplicate rows will be inserted. 2. You dump all your source data into a temporary table. Use a Join between temporary table and Staging table to filter out rows that do not match and insert it into Staging table. Vikash Kumar Singh || www.singhvikash.in
October 3rd, 2012 8:53pm

Thank you for the advice, Vikash. That would definitely meet the overall objective, but I work on a lot of SSIS packages and they all need to be tested after I make some changes. I want to be able to introduce a standard method for forcing packages to stop importing after a fixed number of rows have been imported. I recognize that calling stored procedures is not going to enable that functionality, and that's the price of using stored procedure calls (as opposed to a data flow task, I assume), but making structural changes as Dinesh and yourself have suggested are not valid options for me. For example, if I want to force a perfectly good package to fail after importing only 500 rows tomorrow, I don't want to have to make data flow or query changes; I only want to turn [something] on and then turn it off when I'm done. There has got to be a way!
Free Windows Admin Tool Kit Click here and download it now
October 4th, 2012 5:21pm

I would recommend Vinesh's solution of checking for duplicates as you process the rows! It only seems to make sense to detect the "failure" or "problem" immediately and deal with it. However, if you're definitely focused on running only a subset of the rows through the data flow before you "check" your duplicates (outside of SSIS), then Joost has the correct answer. You need to number your rows and use a Conditional Split to limit the work that's done. Talk to me now on
October 5th, 2012 12:34am

Very well, then. Thanks for your input, Todd. -Eric
Free Windows Admin Tool Kit Click here and download it now
October 5th, 2012 12:15pm

You can Use a CTE at the end to remove the duplicate rows from the target table. WITH CTE (COl1,Col2, DuplicateCount) AS ( SELECT COl1,Col2, ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) ASDuplicateCount FROM DuplicateRcordTable ) DELETE FROM CTE WHERE DuplicateCount > 1 GO
October 5th, 2012 12:48pm

When did I say I need to figure out how to remove duplicate rows?
Free Windows Admin Tool Kit Click here and download it now
October 5th, 2012 1:19pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics