Populate Pivot Table from a Comma Delimited Field (Network Steve Forum)

Populate Pivot Table from a Comma Delimited Field

Hey Guys, This is my 1st post, so be gentle :). I am building an SSIS package to collect information on jobs running throughout the enterprise at work. The goal is to have tables filled with info on Servers, Instance, DBs, Jobs, and their relations to each other. Ideally I will use merge commands to update or insert where nessasary but for now it just inserts. Here is an overview of my package (keep your mind out of the gutter please :P): The 1st Data FLow task pulls in a list of instances to check from a flat text file and stores them in a record set. The foreach loop then goes through each one of them and changes my source Connection Manager's Connection string apropriately. Inside the loop we are going through 1 instance at a time. The "Process Server" Dataflow task is used to find the Server name and add it to the Destiniation DB if it doesn't exist, either way it also stores the Server ID and Name in package variables. The "Process Instance" does the same as the above but for the Instance instead. The "Collect DB Data" Task then uses those package variables to insert all the DBs in that instance as records with the Package variables mentioned above for Forigen keys. Once that is Finished we move on to the "Collect Job Data" task (hopefully the final task for this package). The following is the contents of the last task: So inside of this task this is what I'm doing so far. I use a query to collect Job Info with Maintenance plan data 1st. Here is the Query for the OLE DB Source: --WRITTEN BY MAXWELL WALLACE --THE PURPOSE OF THIS QUERY IS TO COLLECT INFORMATION ABOUT JOBS AND MAINTENANCE PLANS --RUNNING ON A PARTICULAR INSTANCE. IT COLLECTS NAMES, STATUSES, RUN TIMES AND DATES --AS WELL AS DATABASES AFFECTED AND MAINTENANCE PLAN NAMES IF APPLICABLE. SELECT B.NAME AS JOB_NAME, B.CATEGORY_ID, --RUN_STATUS CODE GETS TRANSLATED INTO ENGLISH CASE A.RUN_STATUS WHEN 0 THEN 'Failed' WHEN 1 THEN 'Succeeded' WHEN 2 THEN 'Retry' WHEN 3 THEN 'Canceled' ELSE 'Unknown' END AS RUN_STATUS, --CONVERT INTEGER DATE INTO SOMETHING MORE LEGABLE SUBSTRING(CAST(A.RUN_DATE AS CHAR(8)),5,2) + '/' + RIGHT(CAST(A.RUN_DATE AS CHAR(8)),2) + '/' + LEFT(CAST(A.RUN_DATE AS CHAR(8)),4) AS RUN_DATE, --CONVERT RUN_TIME INTO SOMETHING MORE RECONGNIZABLE (HH:MM:SS) LEFT(RIGHT('000000' + CAST(A.RUN_TIME AS VARCHAR(10)),6),2) + ':' + SUBSTRING(RIGHT('000000' + CAST(A.RUN_TIME AS VARCHAR(10)),6),3,2) + ':' + RIGHT(RIGHT('000000' + CAST(A.RUN_TIME AS VARCHAR(10)),6),2) AS RUN_TIME, --CONVERT RUN_DURATION INTO SOMETHING MORE RECONGNIZABLE (HH:MM:SS) LEFT(RIGHT('000000' + CAST(A.RUN_DURATION AS VARCHAR(10)),6),2) + ':' + SUBSTRING(RIGHT('000000' + CAST(A.RUN_DURATION AS VARCHAR(10)),6),3,2) + ':' + RIGHT(RIGHT('000000' + CAST(A.RUN_DURATION AS VARCHAR(10)),6),2) AS RUN_DURATION, --THE FOLLOWING SUBQUERY IS USED TO EXTRAPOLATE DETAILS FOR THE JOB IN IT'S MAINTENANCE PLAN (IF IT HAS 1) --THE TOP 1 MAKES SURE WE GET ONLY 1 RECORD SINCE THIS IS A 1 TO MANY RELATIONSHIP --THE LINE3 COLUMN CONTAINS DETAILS ABOUT THE TASK THAT WAS RUN (SELECT TOP 1 E.LINE3 --WE START WITH THE SYSMAINTPLAN_LOG BECAUSE WE CAN (IN A WAY) JOIN IT TO OUR OUTER JOIN THROUGH THE PLAN_ID IN THE WHERE CLAUSE FROM MSDB.DBO.SYSMAINTPLAN_LOG AS D --NOW IT IS POSSIBLE TO, BY EXTENTION, JOIN SYSMAINTPLAN_LOGDETAIL TO THE OUTER JOIN AS WELL THROUGH ITS 1 TO 1 RELATIONSHIP WITH SYSMAINTPLAN_LOG INNER JOIN MSDB.DBO.SYSMAINTPLAN_LOGDETAIL AS E ON E.TASK_DETAIL_ID = D.TASK_DETAIL_ID --THE 1ST PART OF THE WHERE RETURNS ONLY RECORDS OF THE SAME PLAN_ID, ESSENTIALLY "JOINING" THIS RECORD TO THE OUTER JOIN THE IN MAIN QUERY --THE 2ND PART MAKES SURE THE FIELD WE ACTUALLY CARE ABOUT CONTAINS MEANINGFUL DATA WHERE D.PLAN_ID = C.PLAN_ID AND E.LINE3 != '') AS PLAN_DETAILS, --THE FOLLOWING SUBQUERY RETURNS THE NAME OF THE MAINTENANCE PLAN (IF IT HAS 1) (SELECT F.NAME FROM MSDB.DBO.SYSMAINTPLAN_PLANS AS F --THIS IS A SYSTEM GENERATED VIEW --LIKE THE ABOVE SUBQUERY, THIS WHERE ESSENTIALLY "JOINS" THIS RECORD TO THE OUTER JOIN IN THE MAIN QUERY WHERE F.ID = C.PLAN_ID) AS PLAN_NAME FROM MSDB.DBO.SYSJOBHISTORY AS A INNER JOIN MSDB.DBO.SYSJOBS AS B ON A.JOB_ID = B.JOB_ID --THIS OUTTER JOIN ATTACHES PLAN_IDS OF MAINTENANCE PLANS TO JOBS THAT HAVE THEM LEFT OUTER JOIN SYSMAINTPLAN_SUBPLANS AS C ON C.JOB_ID = B.JOB_ID --ONLY RETURN ENABLED JOBS WHERE B.[ENABLED] = 1 --AND ONLY JOB OUTCOMES, NOT EACH STEP AND A.STEP_ID = 0 --AND ONLY COMPLETED JOBS AND A.RUN_STATUS <> 4 --SORTED BY LATEST DATE 1ST ORDER BY A.RUN_DATE DESC After that I add the package variable Instance ID as a column to assist in inserting these records with this Forigen Key. I convert some strings to unicode which is neither here nor there and then I do a conditional split on Records With a maintenance plan and records without. For records without I can Simply insert them into the destination and they are done! For records with a Maintenance plan however, the chance of them having a connection to one or more DB is very high. So, 1st I insert the job record into the Job table (exactly as I do with records not within a maintenance plan) and then I do a lookup to find the record I just inserted's ID. Next I take the field from my query that has a comma seperated list of DBs affected by the Maintenance plan this job is part of and seperate it into a VB.Net ArrayList. I then assign that ArrayList to a Package Variable. This is the part I'm at. Obviously my next step is to create some sort of loop using the Job ID I just looked up and running through each variable in the ArrayList to insert them 1 at a time into the DB/Job Pivot table. Problem is I am not aware of how I can do a loop within a Data Flow Task and I can't think of a good way to move the pivot table insert out of this task. I might be able to do this with a script component but I am unsure how to preform inserts from within the script task (should I even consider that?). I am proficent with VB.Net and C# as well as t-SQL so I can investigate any method of implementing this. Thanks in advance for your help. Cheers!

December 13th, 2010 11:46am

You don't want to do loops. We avoid them in SSIS just the same as you would try to avoid cursors in T-SQL. You just need to rework your script a little. Instead of splitting the row apart and storing an array in a package variable, make your script an asynchronous transform instead. The idea is to take one row in - with your delimited list of DBs - and output one row per DB. Basically an UNPIVOT operation. Then you can just insert that set of records. Let us know if you need help with that transform script... Talk to me now on

Free Windows Admin Tool Kit Click here and download it now

December 13th, 2010 12:48pm

Hey Todd! 1st of all thank you so much for your speedy response! Let me try and wrap my mind around this proposed soluton. So basically I would have the data set as it comes in from the source query which will go into the script. As each record goes into my asynchronous transfrom I clear that comma delimited field and put it into an array. For each value found in that list I add a row to the data set that would have perhaps the Job ID as well as the DB ID (which I would somehow have to lookup) but otherwise nothing from the original record. Once the the recordset leaves the transform I have all the records I previously had plus 1 new record for each item in the comma delimited field of each record with a maintenance plan. Which I can then do a conditional split on for inserting them. Is this correct? When you say having 1 row per DB it gets me confused. Having duplicate records for each job / DB combonation seems unwise to me because really the list of DBs is pretty much a second record set onto itself in my eyes. For each record with a maintenance plan there is pretty much another record set of rows with the ID of the job record we just processed but a different ID depending on which DB its refering to. The DB / Job relationship is held in it's own table that is seperate (yet connected) to both the Job table and the DB table. I would love any aditional help with the transform if you would be so kind. Also, let me know if you need any more information, such as structure for the destination DB etc. I have found in my research that some say I should avoid asynchronous transforms as much as possible. Are alternitives worth considering?: http://sqlreality.com/blog/ssis/avoid-the-asynchronous-transformation-as-much-as-possible/ On a side note, I found this post a few days ago but was unsure if this is what I needed / where to go from there. Is this at all similar?: http://www.sqlservercentral.com/Forums/Topic679052-148-1.aspx

December 13th, 2010 2:41pm

If I'm suggesting something that doesn't make sense - like "one row per DB" - it's only because I'm not understanding what your data looks like. No, you should no avoid asynchronous transformations unnecessarily - you just need to know they can be performance issues in some circumstances. I highly doubt that to be the case here, as replacing a loop of loops with a poorly performing single data flow is still better... Jack's code from SQLServerCentral can help you, I think... it might help if you post some sample data. Basically, we're trying to do an UNPIVOT here - taking one row with a list of values and transforming it into several rows with one value each. Just like in T-SQL, we're not going to have two different rowsets coming out of the UNPIVOT operation. We'll just have one, with all rows having the same metadata (column names and data types). Don't think "arrays" - they'll get you into trouble in SSIS in most cases. Think "rowsets", just like in T-SQL. Talk to me now on

Free Windows Admin Tool Kit Click here and download it now

December 14th, 2010 12:14am

Hey Todd, Ok, well like I said before I would really like help with the transform script. Here is the structure of the DB in inserting into: CREATE TABLE TBL_SERVERS( ID INT UNIQUE IDENTITY(1,1), TITLE NVARCHAR(50) PRIMARY KEY, CLUSTER_NAME NVARCHAR(50) DEFAULT '', RESOURCES_USED NVARCHAR(20) DEFAULT '', RESOURCE_THRESHOLD NVARCHAR(20) DEFAULT '', IS_CLUSTERED BIT NOT NULL DEFAULT 0) CREATE TABLE TBL_INSTANCES( ID INT UNIQUE IDENTITY(1,1), SERVER_ID INT NOT NULL REFERENCES TBL_SERVERS(ID), TITLE NVARCHAR(50) NOT NULL, PRIMARY KEY (SERVER_ID,TITLE)) CREATE TABLE TBL_CATEGORY_TYPES( ID INT UNIQUE IDENTITY(1,1), TITLE NVARCHAR(50) PRIMARY KEY) INSERT INTO TBL_CATEGORY_TYPES VALUES ('LOCAL') INSERT INTO TBL_CATEGORY_TYPES VALUES ('MULTISERVER') INSERT INTO TBL_CATEGORY_TYPES VALUES ('NONE') CREATE TABLE TBL_CATEGORY_CLASSES( ID INT UNIQUE IDENTITY(1,1), TITLE NVARCHAR(50) PRIMARY KEY) INSERT INTO TBL_CATEGORY_CLASSES VALUES ('JOB') INSERT INTO TBL_CATEGORY_CLASSES VALUES ('ALERT') INSERT INTO TBL_CATEGORY_CLASSES VALUES ('OPERATOR') CREATE TABLE TBL_CATEGORIES( ID INT UNIQUE IDENTITY(1,1), TITLE NVARCHAR(50) NOT NULL, CATEGORY_CLASS_ID INT NOT NULL REFERENCES TBL_CATEGORY_CLASSES(ID), CATEGORY_TYPE_ID INT NOT NULL REFERENCES TBL_CATEGORY_TYPES(ID), PRIMARY KEY (TITLE,CATEGORY_CLASS_ID)) CREATE TABLE TBL_SQL_JOBS( ID INT PRIMARY KEY IDENTITY(1,1), TITLE NVARCHAR(200) NOT NULL, INSTANCE_ID INT NOT NULL REFERENCES TBL_INSTANCES(ID), CATEGORY_ID INT NOT NULL REFERENCES TBL_CATEGORIES(ID), RUN_STATUS NVARCHAR(10) NOT NULL, RUN_DATE NVARCHAR(10) NOT NULL, RUN_TIME NVARCHAR(8) NOT NULL, RUN_DURATION NVARCHAR(8) NOT NULL, MAINTENANCE_PLAN_NAME NVARCHAR(200), RUN_INTERVAL NVARCHAR(20) DEFAULT '', IS_ENABLED BIT NOT NULL DEFAULT 1) SET IDENTITY_INSERT TBL_CATEGORIES ON INSERT INTO TBL_CATEGORIES (ID,TITLE,CATEGORY_CLASS_ID,CATEGORY_TYPE_ID,CATEGORY_GROUP_ID) VALUES (0,'[Uncategorized (Local)]',1,1,1) INSERT INTO TBL_CATEGORIES (ID,TITLE,CATEGORY_CLASS_ID,CATEGORY_TYPE_ID,CATEGORY_GROUP_ID) VALUES (2,'[Uncategorized (Multi-Server)]',1,2,1) INSERT INTO TBL_CATEGORIES (ID,TITLE,CATEGORY_CLASS_ID,CATEGORY_TYPE_ID,CATEGORY_GROUP_ID) VALUES (98,'[Uncategorized]',2,3,1) INSERT INTO TBL_CATEGORIES (ID,TITLE,CATEGORY_CLASS_ID,CATEGORY_TYPE_ID,CATEGORY_GROUP_ID) VALUES (99,'[Uncategorized]',3,3,1) INSERT INTO TBL_CATEGORIES (ID,TITLE,CATEGORY_CLASS_ID,CATEGORY_TYPE_ID,CATEGORY_GROUP_ID) VALUES (8,'Data Collector',1,1,1) INSERT INTO TBL_CATEGORIES (ID,TITLE,CATEGORY_CLASS_ID,CATEGORY_TYPE_ID,CATEGORY_GROUP_ID) VALUES (7,'Database Engine Tuning Advisor',1,1,1) INSERT INTO TBL_CATEGORIES (ID,TITLE,CATEGORY_CLASS_ID,CATEGORY_TYPE_ID,CATEGORY_GROUP_ID) VALUES (3,'Database Maintenance',1,1,1) INSERT INTO TBL_CATEGORIES (ID,TITLE,CATEGORY_CLASS_ID,CATEGORY_TYPE_ID,CATEGORY_GROUP_ID) VALUES (5,'Full-Text',1,1,1) INSERT INTO TBL_CATEGORIES (ID,TITLE,CATEGORY_CLASS_ID,CATEGORY_TYPE_ID,CATEGORY_GROUP_ID) VALUES (1,'Jobs from MSX',1,1,1) INSERT INTO TBL_CATEGORIES (ID,TITLE,CATEGORY_CLASS_ID,CATEGORY_TYPE_ID,CATEGORY_GROUP_ID) VALUES (6,'Log Shipping',1,1,1) INSERT INTO TBL_CATEGORIES (ID,TITLE,CATEGORY_CLASS_ID,CATEGORY_TYPE_ID,CATEGORY_GROUP_ID) VALUES (18,'REPL-Alert Response',1,1,1) INSERT INTO TBL_CATEGORIES (ID,TITLE,CATEGORY_CLASS_ID,CATEGORY_TYPE_ID,CATEGORY_GROUP_ID) VALUES (16,'REPL-Checkup',1,1,1) INSERT INTO TBL_CATEGORIES (ID,TITLE,CATEGORY_CLASS_ID,CATEGORY_TYPE_ID,CATEGORY_GROUP_ID) VALUES (10,'REPL-Distribution',1,1,1) INSERT INTO TBL_CATEGORIES (ID,TITLE,CATEGORY_CLASS_ID,CATEGORY_TYPE_ID,CATEGORY_GROUP_ID) VALUES (11,'REPL-Distribution Cleanup',1,1,1) INSERT INTO TBL_CATEGORIES (ID,TITLE,CATEGORY_CLASS_ID,CATEGORY_TYPE_ID,CATEGORY_GROUP_ID) VALUES (12,'REPL-History Cleanup',1,1,1) INSERT INTO TBL_CATEGORIES (ID,TITLE,CATEGORY_CLASS_ID,CATEGORY_TYPE_ID,CATEGORY_GROUP_ID) VALUES (20,'Replication',2,3,1) INSERT INTO TBL_CATEGORIES (ID,TITLE,CATEGORY_CLASS_ID,CATEGORY_TYPE_ID,CATEGORY_GROUP_ID) VALUES (13,'REPL-LogReader',1,1,1) INSERT INTO TBL_CATEGORIES (ID,TITLE,CATEGORY_CLASS_ID,CATEGORY_TYPE_ID,CATEGORY_GROUP_ID) VALUES (14,'REPL-Merge',1,1,1) INSERT INTO TBL_CATEGORIES (ID,TITLE,CATEGORY_CLASS_ID,CATEGORY_TYPE_ID,CATEGORY_GROUP_ID) VALUES (19,'REPL-QueueReader',1,1,1) INSERT INTO TBL_CATEGORIES (ID,TITLE,CATEGORY_CLASS_ID,CATEGORY_TYPE_ID,CATEGORY_GROUP_ID) VALUES (15,'REPL-Snapshot',1,1,1) INSERT INTO TBL_CATEGORIES (ID,TITLE,CATEGORY_CLASS_ID,CATEGORY_TYPE_ID,CATEGORY_GROUP_ID) VALUES (17,'REPL-Subscription Cleanup',1,1,1) SET IDENTITY_INSERT TBL_CATEGORIES OFF CREATE TABLE TBL_APPLICATIONS( ID INT UNIQUE IDENTITY(1,1), TITLE NVARCHAR(200) NOT NULL, HUB_SITE NVARCHAR(50) DEFAULT '', PRIMARY KEY (TITLE,HUB_SITE)) CREATE TABLE TBL_DATABASES( ID INT UNIQUE IDENTITY(1,1), INSTANCE_ID INT NOT NULL REFERENCES TBL_INSTANCES(ID), TITLE NVARCHAR(200) NOT NULL, APPLICATION_ID INT REFERENCES TBL_APPLICATIONS(ID), MANAGED BIT NOT NULL DEFAULT 0, CONNECTIONSTRING NVARCHAR(MAX) NOT NULL DEFAULT '', RESOURCES_USED NVARCHAR(MAX) NOT NULL DEFAULT '', RESOURCE_THRESHOLD NVARCHAR(MAX) NOT NULL DEFAULT '', LAST_SEEN DATETIME NOT NULL DEFAULT GETDATE(), PRIMARY KEY (INSTANCE_ID,TITLE)) CREATE TABLE TBL_DATABASE_JOBS( ID INT UNIQUE IDENTITY(1,1), DATABASE_ID INT NOT NULL REFERENCES TBL_DATABASES(ID), JOB_ID INT NOT NULL REFERENCES TBL_SQL_JOBS(ID), PRIMARY KEY (DATABASE_ID,JOB_ID)) And here is some sample results from the query I posted earlier. Keep in mind the script can be run against any instance as long as you used MSDB because it uses all system generated tables and views: Just do be clear about my objectives for this package. JOB_NAME, CATEGORY_ID, RUN_DATE, RUN_TIME, RUN_DURATION and PLAN_NAME all go into the TBL_SQL_JOBS table. The PLAN_DETAILS column will do nothing for nulls (as will PLAN_NAME) bur for populated records it will remove the "Databases: " string and split the comma delimited database names. Then it needs to check the DB names from the split against the TBL_Databases table (previously populated) and grab the coresponding ID. Then, combined with the ID of the current job record we are processing (think the "lookuop job ID" part of the last task of the package) we add those records to the TBL_DATABASE_JOBS table seperately. The end result being a table with a list of unique DBs and a table with a list of Historical Job info and a table between that provides a 1 job: to many DB relationship.

December 14th, 2010 9:40am

I recently had the idea of maybe passing the CSVs in a string to a stored procedure. Then in the SP, have a UDF split the values and return them as a table. Then I could just add the job ID (I already have it, I'd just need to send it in the SP as another parameter) and do a while loop within the SP to do the Inserts. Good idea?

Free Windows Admin Tool Kit Click here and download it now

December 14th, 2010 4:02pm

Not a bad idea - especially if you understand it. But it is another "cursor" type implementation. What you can do is use an asynchronous script. Push your data flow in to the Script. Go to the output tab, and set the SynchronousInputID to -1. Now select the Output node and add columns. At a minimum, I'd add "JOB_NAME" and "Database" - data type them appropriately. Go back to the input tab and select JOB_NAME and PLAN_DETAILS as ReadOnly. Inside your script, delete the "CreateNewOutputRows" override. You don't need that. In the Output0_ProcessInputRow override, use code like this (syntax probably isn't 100%): if (!Row.PLAN_DETAILS_IsNull) { string[] databases = Row.PLAN_DETAILS.Replace("Databases: ","").Split(','); foreach (string database in databases) { if (database.Trim().Length > 0) { Output0Buffer.AddRow(); Output0Buffer.JOB_NAME = Row.JOB_NAME; Output0Buffer.Database = database; } } } You'll probably have to tweak that. Talk to me now on

December 14th, 2010 6:47pm

Hey Guys, I solved my problem! I used my idea of a store procedure and it worked. This is what I did. I got rid of that last script component in my original post and instead just sent the Job_ID, instance_ID and raw Plan_Details field to my stored procedure. The stored procedure uses this following UDF to split the string: CREATE FUNCTION DBO.FN_SPLIT (@DELIMITER CHAR(1), @STRING NVARCHAR(512)) RETURNS TABLE AS RETURN ( WITH PIECES(PIECE_NUM, START, STOP) AS ( SELECT 1, 1, CHARINDEX(@DELIMITER, @STRING) UNION ALL SELECT PIECE_NUM + 1, STOP + 1, CHARINDEX(@DELIMITER, @STRING, STOP + 1) FROM PIECES WHERE STOP > 0 ) SELECT PIECE_NUM, SUBSTRING(@STRING, START, CASE WHEN STOP > 0 THEN STOP - START ELSE 512 END) AS PIECE FROM PIECES ) and this is the stored procedure I used to call it: CREATE PROCEDURE USP_JOB_DB_RELATIONSHIP_FILL (@INSTANCE_ID INTEGER, @JOB_ID INTEGER, @DB_LIST NVARCHAR(512)) AS INSERT INTO TBL_DATABASE_JOBS SELECT D.ID, @JOB_ID FROM DBO.FN_SPLIT( ',', REPLACE(@DB_LIST,'Databases: ','')) AS S INNER JOIN TBL_DATABASES AS D ON D.TITLE = S.PIECE AND D.INSTANCE_ID = @INSTANCE_ID So the stored procedure takes in the parameters I mentioned above and does 1 insert statement. That’s it! The split takes in the comma delimited string (which I first groom with the replace command) and returns a table with a record for each value in the list. I join that table on my table of DBs linked on the name (which = the values) and the instance ID. Then it’s all wrapped up in a neat little insert with the current Job_ID (which was also passed to the procedure) and that’s all she wrote. The beautiful thing about it is if the DBs aren’t found in the DB table the records aren’t inserted because the join doesn't return any matches. Nice eh? And note Todd, no cursors... :P The above proposed solution just doesn't work. I had to step back and realize that I was going down the wrong path to find a solution and that SSIS was not the right tool for that particular part of the job. Todd's suggestion would have me adding all kinds of records to the dataset that were not only going to a different table, but have a completely different structure. Even if that did somehow work, I'm still left with only a partial solution because now I still have to DEAL with those records. Instead, my 6 line stored procedure did it all. I have to say I'm actually pretty disappointed with this community. While I do appreciate Todd's efforts, he is the only one who even attempted. I am a very junior DBA. How is it that I came up with a solution before anyone else even had a good idea? I posted every piece of data I could think of relating to this problem and all the research I had done too. I would have been better off doing what I usually do which is figure out the problem myself, and keep the solution to myself instead of wasting my time explaining my problem to the community at large in hopes of sharing my issue and solution only to be largely ignored. I guess if you want something done right, you've got to do it yourself...

Free Windows Admin Tool Kit Click here and download it now

December 15th, 2010 4:24pm

That's kind of a selfish post there. Can you tell me how many posts you've participated in any of the forums here? I can - none. How many threads had you read before you posted your question? Probably none as well. I'd ask you not to disparage the community that it now appears you were just here to leech off of to help you with your problem. You did provide a lot of information and did write your post comprehensively. Unfortunately, doing that can scare off help - answerers do have to be willing to get more deeply involved in such a thread. The fact that I started discussions with you also wards people off. Why interrupt a conversation that appears to be headed in a good direction? Especially when lots of the other experts here happen to know me, and (I think) would feel confident that I'd be able to help you out without them jumping in, so they leave this thread alone. I'd invite you to change your perceptions a little bit. It isn't anybody's "job" here to help you out. We're all volunteering our time and effort. Turning around and demanding a certain response level isn't going to endear you to anyone. Although you may feel you were "largely ignored", take a moment to think about how you're making me feel. I have to say that I was almost completely ignored. I'm disappointed I couldn't teach you how to use SSIS properly to accomplish what you were looking to do. Now you have the perception that SSIS can't do what you wanted it to do - or if it could, it's the round peg for your square hole. This is absolutely not the case. But I probably won't be able to change your perception on that issue because you now disrespect the "community" and because of this post, the community isn't as likely to respect you. One last thing. You are "using a cursor" even though you don't think you are. Sure, you aren't using a T-SQL CURSOR construct - but you're doing the same architectural work - one execution per row. One round-trip from the "client" (SSIS) to the database to insert/update one row. That's worse than a cursor. Talk to me now on

Free Windows Admin Tool Kit Click here and download it now

December 15th, 2010 4:49pm

Hey Todd, Well you make a couple of good points. I agree that my expectations of the community were too high, especially considering they are volunteers. They certainly don’t owe me anything and visa versa. My disappointment stemmed from the fact that I see many threads with several peoples inputs being shared and was hoping for the same in mine. I also see the point you made about my initial post’s verbosity. Next time I will have to “sell” my problem more effectively, even if that means purposely leaving out info in the initial post. I do, however, disagree that my last post was a selfish one. A selfish poster would have abandoned this thread, and never shared with the community the answer to the problem I posted. Like I said earlier I am a rather junior DBA. I have not had the time to rack up the post count you have, but that should not be a reason for you to devalue my issues or opinions. You make a lot of assumptions about me, which is really unfortunate. Believe it or not I did read the rules before posting, which means I did look for related posts before posting. That is also why I included as much information as possible in my initial post, even though that piece of advice was more of a hindrance than helpful. I am also not a closed minded person. I know SSIS is a powerful tool and I am still receptive to a method within SSIS that could solve my problem. At this point, however, I have not seen, nor can I think of, a way to accomplish what I did in that stored procedure completely within SSIS. When you say that I am in fact using a cursor it makes me curious. When you mentioned cursors before I was indeed thinking of the T-SQL Cursor construct itself. I wouldn’t mind further explanation on why this qualifies as one or how it could possibly be worse. At the end of the day, however, I would gladly take an inefficient solution over a non-existent one. I have been nothing but grateful for your attempts to help me specifically Todd. Just because I didn’t think your solution was the right one does not mean you were ignored. Perhaps you never really understood my problem, or maybe I never really understood your solution. Either way, that link of understanding was not there. In the end, my disappointment does not equal disrespect but rather a misalignment of expectations. Your assumptions of me, however, have lead you to disrespect. Clearly being new I did not have a full understanding of how this forum worked (even with reading the rules). Next time you encounter a newbie who does not have a full understanding of how this forum works you would be better off sticking to the constructive criticism and leaving your harsh assumptions at the door.

December 17th, 2010 12:22pm

Whoa. Couldn't leave it alone, could you? I will. To the constructive point: The reason I characterized your solution as worse than a cursor is because cursors are often compared negatively to "better" T-SQL solutions that used set-based logic. This is due to the "row by row" nature of a cursor, which naturally limits what the database engine can accomplish via pure parallelism and other techniques. When you use a cursor in T-SQL, you are confining the engine to use a strictly single-threaded unoptimized approach to execute. When you use a stored procedure inline in your data flow, it's "worse" than a cursor. Why? Because in addition to restricting the engine (the SSIS data flow engine in this case) to a strictly single-threaded option, you're also introducing a much bigger problem - latency. For each and every row passing through your flow, you call your SP. This means that you have to send a request to SQL, have the server generate a response, then return that response. Regardless of how quickly the response generation takes, you have that round-trip to the server. Even if your SSIS package is executing on the same machine as the SQL Server it's talking to, that's a process-crossing communication that takes milliseconds to accomplish. If your servers are separated by a network, your problem just went up a couple orders of magnitude. Talk to me now on

Free Windows Admin Tool Kit Click here and download it now

December 17th, 2010 4:11pm

Haha, that’s more like it! Why would I leave it alone when I stand to learn something? I see what you’re saying about the stored procedure part. That’s actually something I can get rid of. The stored procedure part of the solution is unnecessary. If I have the same OLE DB command component with the same input and replace the stored procedure call with the same command I used in the stored procedure I can accomplish the same objective and "trim the fat" so to speak with that stored procedure round trip business. Now that that is out of the way, that still leaves that "row by row" cursor business though and I don't see how that can be changed. Can you think of any other ways to optimize this? If you can think of a way to accomplish this better (perhaps sticking completely to SSIS) then I will mark your post as the answer.

Free Windows Admin Tool Kit Click here and download it now

December 18th, 2010 1:23am

Free Windows Admin Tool Kit Click here and download it now

December 18th, 2010 1:23am

Moving the contents of the stored procedure directly inside the OLE DB Command is actually worse. Using a Stored Procedure allows the database engine to compile and optimize the query itself. It doesn't eliminate the round trip latency. You can get rid of the row-by-row nature of your architecture. Re-read the previous posts. Using an asynchronous script to parse your string column into separate rows will work. Talk to me now on

December 18th, 2010 2:27pm

This topic is archived. No further replies will be accepted.