Merge Join behavior
I'm relatively new to SSIS, so I apologize for what might be a dumb question. I'm using a Merge Join, and everything is working okay... but I just found something that was a bit odd. When I had rows in ONE data set that were not in the other data set (and my join is set to INNER), the Merge Join STOPPED processing. It didn't just ignore the rows that didn't match, it ignored all rows after it. Why is this? Is there some setting I can change? I have changed this to a LEFT OUTER join to avoid this, but I would prefer an inner (performance) Thanks, K
March 9th, 2011 6:01pm

That does not sound like expected behaviour to me. When you say it "stopped" processing, do you mean the package did not complete, or do you mean the Merge Join turned green, and passed control to the remainder of the Data Flow, eventually completing with fewer than expected rows? Talk to me now on
Free Windows Admin Tool Kit Click here and download it now
March 9th, 2011 6:51pm

Thanks for the response Todd... it was the latter... completed with green status, passed control, and completed with fewer rows than expected. And if I add grid results, I can see the number of rows being reduced as it passes through the Merge Join Also something I didn't mention before... I also think it had something to do with the sort. I did NOT have the one record set sorted correctly. So basically, my data looked like this: A A A B B C A A A And if row C was not in my second data set (on the right part of the join), it would stop processing (with green status) and I would lose the 3 last rows of A, even though A is in the 2nd data set. Thoughts?
March 10th, 2011 10:31am

What you observed is exactly what I would expect from the Merge Join. Why? Because you lied to it :) I'm assuming that, of course - but tell me if you did this: You read your data from your sources, and fed them into the Merge Join. The Merge Join complained that the sources weren't sorted. You found some info on the 'net that said the Sort component was "slow" (or you experienced that yourself), and that the alternative was to set the IsSorted and SortKeyPosition on the source columns. That's where you got yourself in trouble. The IsSorted and SortKeyPostion properties don't sort the data. They just tell SSIS the data is sorted like you say it is. If you don't use an ORDER BY in the source that matches what you told SSIS - then you're "lying" to SSIS about the sort order. I'm surprised the LEFT join worked any differently. Talk to me now on
Free Windows Admin Tool Kit Click here and download it now
March 10th, 2011 12:39pm

Wow - you are good... That's it to the T. So those properties just TELL SSIS that it is sorted, but do not do any sorting? I also added a sort on one of the inputs... so perhaps it was the sort that fixed this, and NOT the left outer join after all.... I think I'll test that to see. Thanks for the information regarding the properties. K
March 10th, 2011 1:29pm

Oh, but I used those properties simply because it seemed easier than adding another component to the package... not due to performance issues/considerations.
Free Windows Admin Tool Kit Click here and download it now
March 10th, 2011 1:32pm

I reran the package with the sort REMOVED. And again, it still worked, so the outer join did indeed fix it also (although not in the way I want) Thanks for the information.... Seems kind of silly to me that SSIS requires the data to be sorted, and then also allows for the ability to 'fake' a sort... kind of odd if you ask me. Any idea if there is a reason for this? Thanks again for the help/feedback, K
March 10th, 2011 1:40pm

The reason is that SSIS can't discover that from the source component. You can specify a SQL Statement in there in any flavour of SQL that the provider you're using understands - which may not be SQL Server. It can't simply detect use of "ORDER BY" and then read off column names - because the provider may not use that keyword or syntax, the columns specified thereafter may not apply to the result (it may be an intermediate portion of the SQL statement). SSIS has to treat all data coming from the provider as unordered. The Merge Join requires sorted inputs. That restriction isn't strictly necessary from a coding point of view - but it does make the implementation a lot easier. There are two ways to get sorted data into a Merge Join. The first one is obvious - use a Sort component. This does the job of sorting the data, and since SSIS is doing the sort, it knows the data's sorted, so it updates those properties in the data flow. The second one is non-obvious - use an ORDER BY and update those properties yourself. So - that's why the "fake" is in there. Not to sort the data, but to inform SSIS that the data is actually sorted. It's interesting to hear using those properties is "easier" than adding a Sort component though... not the usual characterization of them :) Talk to me now on
Free Windows Admin Tool Kit Click here and download it now
March 10th, 2011 4:15pm

The reason is that SSIS can't discover that from the source component. You can specify a SQL Statement in there in any flavour of SQL that the provider you're using understands - which may not be SQL Server. It can't simply detect use of "ORDER BY" and then read off column names - because the provider may not use that keyword or syntax, the columns specified thereafter may not apply to the result (it may be an intermediate portion of the SQL statement). SSIS has to treat all data coming from the provider as unordered. The Merge Join requires sorted inputs. That restriction isn't strictly necessary from a coding point of view - but it does make the implementation a lot easier. There are two ways to get sorted data into a Merge Join. The first one is obvious - use a Sort component. This does the job of sorting the data, and since SSIS is doing the sort, it knows the data's sorted, so it updates those properties in the data flow. The second one is non-obvious - use an ORDER BY and update those properties yourself. So - that's why the "fake" is in there. Not to sort the data, but to inform SSIS that the data is actually sorted. It's interesting to hear using those properties is "easier" than adding a Sort component though... not the usual characterization of them :) Talk to me now on
March 10th, 2011 4:15pm

The reason is that SSIS can't discover that from the source component. You can specify a SQL Statement in there in any flavour of SQL that the provider you're using understands - which may not be SQL Server. It can't simply detect use of "ORDER BY" and then read off column names - because the provider may not use that keyword or syntax, the columns specified thereafter may not apply to the result (it may be an intermediate portion of the SQL statement). SSIS has to treat all data coming from the provider as unordered. The Merge Join requires sorted inputs. That restriction isn't strictly necessary from a coding point of view - but it does make the implementation a lot easier. There are two ways to get sorted data into a Merge Join. The first one is obvious - use a Sort component. This does the job of sorting the data, and since SSIS is doing the sort, it knows the data's sorted, so it updates those properties in the data flow. The second one is non-obvious - use an ORDER BY and update those properties yourself. So - that's why the "fake" is in there. Not to sort the data, but to inform SSIS that the data is actually sorted. It's interesting to hear using those properties is "easier" than adding a Sort component though... not the usual characterization of them :) Talk to me now on
Free Windows Admin Tool Kit Click here and download it now
March 10th, 2011 4:15pm

Makes sense as to what you are saying... Although quite frankly, why should I care if it's easier to implement a merge join with a sort? As you state, it isn't necessary from a coding point of view, so I would argue that SSIS should take care of whatever is needed in order to make a 'join'. You'd think whatever is easier for the USERS is what it should do. As for why I find it easier, it's just one less component... it's cleaner to me to do it this way. But that's just because I don't like too much clutter... ask my wife about that one! Thanks again Todd
March 10th, 2011 7:15pm

Last reply... I was wrong about my statement that REMOVING the sort, but LEAVING the outer join worked. This did NOT fix my issue... I was looking at something wrong. So your surprise that the left outer fixed things is again right on... the sort is the all important part. Thanks again for the help, K
Free Windows Admin Tool Kit Click here and download it now
March 10th, 2011 8:17pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics