Eliminating Duplicates upon Importing to DB
Hello! Everyone on this forum is awesome and have helped alot. I only have one more challenge in my SSIS project. Now that I have "cleansed" the data from Duplicates in an Excel spreadsheet and accomplished many transformations I now need to insert this data into an already existing Table. However, there could be duplicates in the primary ID (called CaseID) between the cleansed spreadsheet data and the Table they are being imported into. How can I check each row I am inserting into the Table to see if it is duplicate and if not, insert it, but if it is send that row to an exception table. This i can not figure out with SSIS how to do. Your guidance will be greatly appreciated. I can implement it if I am directly to the best way to do with with SSIS components. Thanks again! MikeMike Kiser
July 30th, 2012 6:03pm

if you want to check if the record is already exists in destination table then redirect that row to another table, otherwise insert that record into destination table; then you can use Lookup transform, and set joining fileds between source data and your destination table. if record was match then connect the match output to an exception (a table for exception records) destiantion. and connect no match output to original destination. If you wan to just remove duplicates from data stream, I strongly recommend to read Todd's article here: http://toddmcdermid.blogspot.co.nz/2009/01/eliminating-duplicate-primary-keys-in.htmlhttp://www.rad.pasfu.com
Free Windows Admin Tool Kit Click here and download it now
July 30th, 2012 6:10pm

Thanks Reza! Do you have a link to an example in your first paragraph? That is what I need to do it sounds like. I have already accomplished removing dups from the data stream and Todd's article helped alot...thanks! MikeMike Kiser
July 30th, 2012 6:13pm

look at below example: http://www.rad.pasfu.com/index.php?/archives/46-SSIS-Upsert-With-Lookup-Transform.html Note that example above implements an UPSERT (Update existing / Insert new ), for your case you do not want to update existing, but redirect them to another destination, so just connect the lookup match output to another destiantion (the table that you want to log duplicate records) instead of ole db command in sample above.http://www.rad.pasfu.com
Free Windows Admin Tool Kit Click here and download it now
July 30th, 2012 6:16pm

Thanks a million! I will study that closely! MikeMike Kiser
July 30th, 2012 6:17pm

Thanks Reza!! That is EXACTLY what I was looking for. I have implemented it and it works! Thanks! I do have one table where I need to check 3 columns to see if I need to update it or skip and not update or insert. I assume that would be similar? Thanks again! MikeMike Kiser
Free Windows Admin Tool Kit Click here and download it now
July 31st, 2012 2:02pm

Hi Mike, Yes similar, in your case joining columns in the columns tab of lookup tranform editor will be those three columnshttp://www.rad.pasfu.com
July 31st, 2012 4:26pm

Hi Reza! maybe you can help me. I am implementing your example. Instead of your ID field I have a CaseID Field which is nvarchar(max). When I create a Lookup Transformation and set the connection and go to the Columns, I try to map the columns and I get a "Cannot Map the lookup column, 'CaseID', because the column data type is a binary large object (BLOB) is nvarchar(max) a BLOB? why won't it allow me to map it? I am very close to getting this working but have worked on this for over 2 hours and it will not let me Map. Any suggestions? Anyone? I am at the end of my rope LOL Thanks! MikeMike Kiser
Free Windows Admin Tool Kit Click here and download it now
July 31st, 2012 6:07pm

Hi Reza! maybe you can help me. I am implementing your example. Instead of your ID field I have a CaseID Field which is nvarchar(max). When I create a Lookup Transformation and set the connection and go to the Columns, I try to map the columns and I get a "Cannot Map the lookup column, 'CaseID', because the column data type is a binary large object (BLOB) is nvarchar(max) a BLOB? why won't it allow me to map it? I am very close to getting this working but have worked on this for over 2 hours and it will not let me Map. Any suggestions? Anyone? I am at the end of my rope LOL Thanks! MikeMike Kiser
July 31st, 2012 6:10pm

SSIS consider nvarchar(max) as DT_NTEXT and varchar(max) as DT_TEXT . you can change them to nvarchar(500) or varchar(500) in underlying database, I mean a specific lenght not a MAX.http://www.rad.pasfu.com
Free Windows Admin Tool Kit Click here and download it now
July 31st, 2012 6:16pm

SSIS consider nvarchar(max) as DT_NTEXT and varchar(max) as DT_TEXT . you can change them to nvarchar(500) or varchar(500) in underlying database, I mean a specific lenght not a MAX.http://www.rad.pasfu.com
July 31st, 2012 6:19pm

Cool! Thanks Reza! I changed the datatypes to nvarchar(255) and it works great! I am having one problem in using a similar stored procedure to the one in your article. My CaseID is an nvarchar(255) and I want to use it like your int ID. However when I set it as the following , nothing happens. Should I use a Like statement instead of equals (=)? I'm not getting an error but nothing is being updated in my table. Thanks for all the help! Mike USE [BAPropertyTax] GO /****** Object: StoredProcedure [dbo].[UpdateStagingTable] Script Date: 08/01/2012 12:58:21 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO ALTER PROCEDURE [dbo].[UpdateStagingTable] @CASEID nvarchar(255), @Current nvarchar(255), @County nvarchar(255) AS BEGIN SET NOCOUNT ON; update dbo.[Destination - YARDI_REPORTS$] set County=@County where CaseID=@CASEID ENDMike Kiser
Free Windows Admin Tool Kit Click here and download it now
August 1st, 2012 2:37pm

By the way, an example of my CaseID is 161-218028 161-219807 Mike Kiser
August 1st, 2012 2:40pm

Hi EMKISER, All seems ok, you do not need to use a Like statement instead of equals (=) if you update the Country according to @CASEID exactly, please run the T-SQL to see the result in SQL Server Management Studio: Declare @CASEID nvarchar(255) Set @CASEID='161-218028' --Please replace it with exist CaseID Select * from dbo.[Destination - YARDI_REPORTS$] where CaseID=@CASEID Thanks, Eileen TechNet Subscriber Support If you are TechNet Subscription user and have any feedback on our support quality, please send your feedback here.
Free Windows Admin Tool Kit Click here and download it now
August 6th, 2012 4:38am

THanks! That also worked! MikeMike Kiser
August 6th, 2012 12:37pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics