Strange Behaviour Sort and Merge Join
I have encountered an annoying problem which causes the Merge Joins to lose records in the dataflow. The problem is caused by 2 unusual behavoirs. 1/ Sort of SSIS is not sorting the same as ORDER BY in SQL example: Code Snippet CREATE TABLE [dbo].[table_2]([test] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL) ON [PRIMARY]With data as following: Code Snippet10001-1.00.00.002000When select this data with an order by like: select test from table_2 order by testThe result will be: Code Snippet test1-1.00.00.0010002000If you sort the data by the SORT block of the SSIS the result will be: Code Snippet test10001-1.00.00.002000This is annoying and dangerous, because it causes the next bug.2/ Two datasources sorted by ORDER BY clause can give problems in a Merge Join.If you have 2 data sources both correctly sorted by an order by in the query. When you join these 2 datasources with a Merge Join, you can lose some records in the dataflow. This happens with larger datasets than examples above. http://img340.imageshack.us/my.php?image=strangebehavior2tw1.pngWhen I join the datasources (see image ) inside SQL I will get a correct result of 15271 records. Is this a bug which I should report? or is there a flaw in my logic?
June 29th, 2007 11:05am

SQL collations sort differently than windows collations; SSIS uses windows collations, while SQL server uses both SQL and windows collations. What you're observing is that a hyphen in a SQL collation (e.g. SQL_Latin1_General_CPI_CI_AS) sorts differently than a hyphen in a windows collation. There is no bug.There are three options.1. Use an ORDER BY COLLATE clause with a windows collation, leaving the table as is.2. Define the sorting column with a windows collation, not the older incompatible SQL collation.3.Define the sorting column with nvarchar character type rather than a varchar character type.Here's are examples of the three options for matching SSIS sort order (Windows) with SQL server's sort order (SQL or Windows). Code Snippet IF OBJECT_ID('[dbo].[table_2]') IS NOT NULL DROP TABLE [dbo].[table_2] go CREATE TABLE [dbo].[table_2] ( [test] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ) INSERT INTO table_2 VALUES ( '1-1.00.00.00' ) INSERT INTO table_2 VALUES ( '1000' ) INSERT INTO table_2 VALUES ( '2000' ) SELECT test FROM dbo.table_2 ORDER BY test -- collates by column collation of SQL_Latin1_General_CP1_CI_AS /* 1-1.00.00.0010002000 */ -- Option 1: Use a Windows collation in an ordered select statement , leaving thetable as is SELECT test FROM dbo.table_2 ORDER BY test COLLATE Latin1_General_CI_AS /* 1000 1-1.00.00.00 2000 */ -- Option 2: Define the table with a Windows collation IF OBJECT_ID('[dbo].[table_2]') IS NOT NULL DROP TABLE [dbo].[table_2] go CREATE TABLE [dbo].[table_2] ( [test] [varchar](50) COLLATE Latin1_General_CI_AS NULL ) INSERT INTO table_2 VALUES ( '1-1.00.00.00' ) INSERT INTO table_2 VALUES ( '1000' ) INSERT INTO table_2 VALUES ( '2000' ) SELECT test FROM dbo.table_2 ORDER BY test /* 1000 1-1.00.00.00 2000 */ -- Option 3: Definetable with nvarchar character types IF OBJECT_ID('[dbo].[table_2]') IS NOT NULL DROP TABLE [dbo].[table_2] go CREATE TABLE [dbo].[table_2] ( [test] [nvarchar](50) NULL ) INSERT INTO table_2 VALUES ( '1-1.00.00.00' ) INSERT INTO table_2 VALUES ( '1000' ) INSERT INTO table_2 VALUES ( '2000' ) SELECT test FROM dbo.table_2 ORDER BY test /* 1000 1-1.00.00.00 2000 */
Free Windows Admin Tool Kit Click here and download it now
June 30th, 2007 5:35am

Thx Jaegd, What is the default windows collation? ( Latin1_General_CI_AS ) ? Where is the default windows collation defined? So the problem should be solved if I change the database collation to match the windows collation. ?
July 2nd, 2007 10:37am

I'm not sure there is a default windows collation, but in any case, yes, the sort problem would be solved by setting the database collation to Latin1_General_CI_AS. At least, that's how I've solved it. here's an article from Microsoft comparing SQL collations with windows collations, http://support.microsoft.com/?id=322112, which points out advantages of Windows collations. Ibelieve the default depends on the machine's locale setting, e.g. if you're machine uses a Denmark locale, the default collation is different. Of course, SSIS components arecase-sensitive, while CI in the sort order stands for case-insentitive, which is why I almost always use upper()/lower() when pulling in business keys. One note: if the server has already been installed with a SQL collation, then when joining to the master database, one would need to use a collation clause (on varchar columns), since, as you've seen, the SQL and Windows collations can be incompatible.
Free Windows Admin Tool Kit Click here and download it now
July 2nd, 2007 7:21pm

I found the following script on Internet which will help you change the server collation. Worked fine for me.http://blogs.msdn.com/ikovalenko/archive/2006/12/03/alter-database-alter-collation-forced.aspx
July 3rd, 2007 11:21am

Indeed caused by different collations: http://microsoft-ssis.blogspot.com/2011/03/sorting-in-sql-vs-sorting-in-ssis.htmlPlease mark the post as answered if it answers your question | My SSIS Blog: http://microsoft-ssis.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
March 25th, 2011 12:40am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics