SSIS :Implementing Hashing in derived column transformation
How to implement /compute hasing using derived column transformtion in SSIS? I have to calculate hash value for all the columns in the table. I have to write the expression to the derived column using Hashbytes. Let me know if there is any other option. Can this be possible SSIS with out the use of scripts. What should be the datatype I need to consider in SSIS and aswell as in SQL database? I really appreciate your responce!!
February 16th, 2010 6:39am
You can't "write" a hashbytes implementation in the Derived Column. Well... I shouldn't say that. I don't know how to code a Hashbytes - but I'm assuming it does a lot of bit shifting and other complex iterations on the contents of the components, making any expression that implemented it virtually incomprehensible and just plain insane.Your alternatives:1. Do the Hashbytes in SQL.2. Do a hash in a Script.3. Use a custom component:Checksum - Computes a hash value, the checksum, across one or more columns, returning the result in the Checksum output column.Multiple Hash - SSIS Multiple Hash makes it possible to generate many Hash values from each input row. Hash's supported include MD5 and SHA1.
February 16th, 2010 10:14am
Thank you telling me the other options I could do in SSIS. I am not clear of using the the hash in a Script and link that to my derived column. I am looking to calculate this in my transformation. Could ypu please how can I process..?
February 16th, 2010 9:44pm
If you decide to do this within a script component you will essentially have to come up with a method to loop over all of the available columns and combine them in a byte array (usually by first combining them into a single string). This byte array will then be used as an input to the hash function. At this point, we are talking more of a vb.net (or C#) question than an SSIS question. VB How to compute and compare hash values Is there a rule in place to prevent you from using a third party component? If not, I would highly recommend you use one of the transformations Todd mentioned. The company I work with uses the checksum transformation by konesans.Please mark answered posts. Thanks for your time.
February 16th, 2010 10:59pm
I really appreciate both of your thought!As per the organisation standards we are not going for sql scripts to implemnt logic. so I am trying to see if it is possible in SSIS it self. Reason for not considering the Checksum transformation is, I have already the computed hash value in my History table using Hashbytes. As the algorithm/calcutated values differs for both .. I need to think of using it.I need to compute hashvalue for incoming records and compare with my existing Hash values in History table. If the hash value matches, I am not inserting the records, else Inserting records. So as the calculation differs for both the Checksum and hashbytes methods, I am going for the existing method.
February 17th, 2010 6:19am
is any other way that works for my requiremnt?
February 17th, 2010 6:31am
When we say "script" we do not mean "T-SQL script". We mean that you should use the Script Component in your Data Flow in place of where you're wanting to use the Derived Column component.However, you should attempt to use the custom component to see if it meets your needs, it's arguably a much cleaner and more usable solution than coding a .Net script for which you don't seem to be prepared for.
February 18th, 2010 12:10am
Yes you are right, I could use script component. I can try. Do you know how to worok with script component. Does it supports the Hashbytes function?
February 18th, 2010 12:25am
The SQL Server HashBytes function supports a number of different hashing algorithms (http://msdn.microsoft.com/en-us/library/ms174415.aspx). You'll need to know which one was used to create the has values in your specific tables, and then implement the .NET code to perform the same hashing in your script. The Multiple Hash component that Todd referenced above is open source, and should provide a good example.For a hashing component, I really recommend creating either a custom component or using one of the existing ones ot there ( a number of which are free). The problem with script components is that the code is copied to each package that needs to use it. If you have a bug in your hashing routine, or every want to update it, you have to go touch every package with the script component in it. For some things, that may be fine, but checksum generation is something you want to be consistent across all your packages.John Welch | www.varigence.com | www.agilebi.com | ssisUnit.codeplex.com
February 18th, 2010 2:03am
Okay, The current algorithm which I have to use is SHA1. I am ready to implement it using Script component. Please let me know how to build it.
February 18th, 2010 6:44am
Does Checksum transformation really works for my task? Because I am thinking of calculating checksum for History table records and then for my incoming records. Finally compare both the key columns for duplicates!!!
February 18th, 2010 8:10am
Yes, the checksum transformation should work for your requirements. Note that you may or may not have the same results as the current hash code in your results depending on the order you input the columns (i.e. which ones you "click" first), the algorithm you use, etc. Just make sure that you are consistent. Also note that, as with any hash matching, it is possible you will receive false positive matches. This is due to the fact that multiple sets of data might produce the same value from the function. If this is a risk you are willing to take, it should greatly improve your performance to do matching versus a single integer instead of on a complete column by column basis.Please mark answered posts. Thanks for your time.
February 18th, 2010 5:53pm
Hey, I got the approval from my manager to go ahead with Checksum. So can I directly download the Checksum Transformation for SQL Server 2008 from konesans? What else do I required?
February 18th, 2010 8:04pm
what is the datatype supported by checksum /Checksum transformation?
February 18th, 2010 10:35pm
Hey Sorry for late update.. I have done using checksum transformation! it is working fine. Thanks you Todd , John and Eric !!!
March 24th, 2010 2:27am
Hi, Can we use the Konesons Checksum transformation on new Clustor environment? Please let me know.
May 29th, 2010 1:26am
Is there a specific reason you think would prevent it? You'd just have to install it on every server participating. Talk to me now on
May 29th, 2010 4:24am
Guys I need just small Sujjestion. I already have two column, One with Old Hashes(Orignal record) and one with new Hashes(Update). I am comparing them, but cant find all Updated version. Row.HASHOLD != Row.HashNEW ...do I have to convert those values(Already Hashed(Binary)) to some other datatype or it is just stright compare. Regards Dinesh
April 27th, 2011 2:49pm