Eliminating Duplicates upon Importing to DB
Hello!
Everyone on this forum is awesome and have helped alot. I only have one more challenge in my SSIS project. Now that I have "cleansed" the data from Duplicates in an Excel spreadsheet and accomplished many transformations I now need to insert this data into
an already existing Table. However, there could be duplicates in the primary ID (called CaseID) between the cleansed spreadsheet data and the Table they are being imported into. How can I check each row I am inserting into the Table to see if it is duplicate
and if not, insert it, but if it is send that row to an exception table. This i can not figure out with SSIS how to do. Your guidance will be greatly appreciated. I can implement it if I am directly to the best way to do with with SSIS components.
Thanks again!
MikeMike Kiser
July 30th, 2012 6:03pm
if you want to check if the record is already exists in destination table then redirect that row to another table, otherwise insert that record into destination table; then you can use Lookup transform, and set joining fileds between source data and your
destination table. if record was match then connect the match output to an exception (a table for exception records) destiantion. and connect no match output to original destination.
If you wan to just remove duplicates from data stream, I strongly recommend to read Todd's article here:
http://toddmcdermid.blogspot.co.nz/2009/01/eliminating-duplicate-primary-keys-in.htmlhttp://www.rad.pasfu.com
Free Windows Admin Tool Kit Click here and download it now
July 30th, 2012 6:10pm
Thanks Reza! Do you have a link to an example in your first paragraph? That is what I need to do it sounds like. I have already accomplished removing dups from the data stream and Todd's article helped alot...thanks!
MikeMike Kiser
July 30th, 2012 6:13pm
look at below example:
http://www.rad.pasfu.com/index.php?/archives/46-SSIS-Upsert-With-Lookup-Transform.html
Note that example above implements an UPSERT (Update existing / Insert new ), for your case you do not want to update existing, but redirect them to another destination, so just connect the lookup match output to another destiantion (the table that you want
to log duplicate records) instead of ole db command in sample above.http://www.rad.pasfu.com
Free Windows Admin Tool Kit Click here and download it now
July 30th, 2012 6:16pm
Thanks a million! I will study that closely!
MikeMike Kiser
July 30th, 2012 6:17pm
Thanks Reza!! That is EXACTLY what I was looking for. I have implemented it and it works! Thanks! I do have one table where I need to check 3 columns to see if I need to update it or skip and not update or insert. I assume that would be similar?
Thanks again!
MikeMike Kiser
Free Windows Admin Tool Kit Click here and download it now
July 31st, 2012 2:02pm
Hi Mike,
Yes similar, in your case joining columns in the columns tab of lookup tranform editor will be those three columnshttp://www.rad.pasfu.com
July 31st, 2012 4:26pm
Hi Reza! maybe you can help me. I am implementing your example. Instead of your ID field I have a CaseID Field which is nvarchar(max). When I create a Lookup Transformation and set the connection and go to the Columns, I try to map the columns and I get
a
"Cannot Map the lookup column, 'CaseID', because the column data type is a binary large object (BLOB)
is nvarchar(max) a BLOB? why won't it allow me to map it? I am very close to getting this working but have worked on this for over 2 hours and it will not let me Map.
Any suggestions? Anyone? I am at the end of my rope LOL
Thanks!
MikeMike Kiser
Free Windows Admin Tool Kit Click here and download it now
July 31st, 2012 6:07pm
Hi Reza! maybe you can help me. I am implementing your example. Instead of your ID field I have a CaseID Field which is nvarchar(max). When I create a Lookup Transformation and set the connection and go to the Columns, I try to map the columns and I get
a
"Cannot Map the lookup column, 'CaseID', because the column data type is a binary large object (BLOB)
is nvarchar(max) a BLOB? why won't it allow me to map it? I am very close to getting this working but have worked on this for over 2 hours and it will not let me Map.
Any suggestions? Anyone? I am at the end of my rope LOL
Thanks!
MikeMike Kiser
July 31st, 2012 6:10pm
SSIS consider nvarchar(max) as DT_NTEXT and varchar(max) as DT_TEXT .
you can change them to nvarchar(500) or varchar(500) in underlying database, I mean a specific lenght not a MAX.http://www.rad.pasfu.com
Free Windows Admin Tool Kit Click here and download it now
July 31st, 2012 6:16pm
SSIS consider nvarchar(max) as DT_NTEXT and varchar(max) as DT_TEXT .
you can change them to nvarchar(500) or varchar(500) in underlying database, I mean a specific lenght not a MAX.http://www.rad.pasfu.com
July 31st, 2012 6:19pm
Cool! Thanks Reza! I changed the datatypes to nvarchar(255) and it works great! I am having one problem in using a similar stored procedure to the one in your article. My CaseID is an nvarchar(255) and I want to use it like your int ID. However when I set
it as the following , nothing happens. Should I use a Like statement instead of equals (=)? I'm not getting an error but nothing is being updated in my table.
Thanks for all the help!
Mike
USE [BAPropertyTax]
GO
/****** Object: StoredProcedure [dbo].[UpdateStagingTable] Script Date: 08/01/2012 12:58:21 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[UpdateStagingTable]
@CASEID nvarchar(255),
@Current nvarchar(255),
@County nvarchar(255)
AS
BEGIN
SET NOCOUNT ON;
update dbo.[Destination - YARDI_REPORTS$]
set County=@County
where CaseID=@CASEID
ENDMike Kiser
Free Windows Admin Tool Kit Click here and download it now
August 1st, 2012 2:37pm
By the way, an example of my CaseID is
161-218028
161-219807
Mike Kiser
August 1st, 2012 2:40pm
Hi EMKISER,
All seems ok, you do not need to use a Like statement instead of equals (=) if you update the Country according to @CASEID exactly, please run the T-SQL to see the result in SQL Server Management Studio:
Declare
@CASEID nvarchar(255)
Set
@CASEID='161-218028'
--Please replace it with exist CaseID
Select
*
from dbo.[Destination
- YARDI_REPORTS$] where CaseID=@CASEID
Thanks,
Eileen
TechNet Subscriber Support
If you are
TechNet Subscription user and have any feedback on our support quality, please send your feedback
here.
Free Windows Admin Tool Kit Click here and download it now
August 6th, 2012 4:38am
THanks! That also worked!
MikeMike Kiser
August 6th, 2012 12:37pm