Remove reverse duplicates from Table (Network Steve Forum)

Remove reverse duplicates from Table

I have table with columns as ID, DupeID1, DupeID2. ID column is unique.

DupeID1 and DupeID2 -- the combination should only be there once. I don't want reverse combination of duplicates, i.e. DupeID2, DupeID1 in the table. How can I delete the reverse duplicates from this table?

Thanks.

September 1st, 2015 12:34pm

What is the type of DupeId1 and DupeId2 columns? For integer (or numeric) columns you may try:

declare @t table (Id int identity(1,1), Dup1 int, Dup2 int)
insert into @t (Dup1, Dup2) values (1,3),(3,1), (4,5),(6,7), (7,9), (6,6), (2147483640, 2147483647),
(2147483647, 2147483640)
;

;with cte as (select *, row_number() over (partition by (dup1 * cast(dup2 as bigint) ) order by Id) as Rn from @t)


select * from cte where Rn >1 -- possible dups

Free Windows Admin Tool Kit Click here and download it now

September 1st, 2015 12:51pm

Naomi,

There is a typo in the OVER clause (hope the OP can catch it before testing and coming back with "it didn't work").

> over (partition by dup1* dup2 order by Id)

September 1st, 2015 1:38pm

I've tested the sample before posting, so not sure what was the typo. But I just thought of possible overflow and thus revised the original sample.

Free Windows Admin Tool Kit Click here and download it now

September 1st, 2015 1:43pm

Thank You Naomi. But there was no Typo.

It worked just fine for me.

September 1st, 2015 2:33pm

I've tested the sample before posting, so not sure what was the typo. But I just thought of possible overflow and thus revised the origin

Free Windows Admin Tool Kit Click here and download it now

September 1st, 2015 2:34pm

My fault, thought you didn't mean the product of both.

The OP has to be careful with numbers with multiple divisors, for example 10 is the product of 10*1 and 5*2 or 20 being 4*5 and also 10 * 2.

Check the output from these two queries. The pairs (4, 5) and (2, 10) are the ones that should be flagged to be deleted.

DECLARE @t table (Id int IDENTITY(1,1), Dup1 int, Dup2 int)

INSERT INTO @t (Dup1, Dup2) 
VALUES 
    (20, 1),
    (5, 4),
    (4, 5),
    (10, 2),
    (2, 10);

-- Naomi's suggestion
;WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY (Dup1 * CAST(Dup2 AS bigint) ) ORDER BY Id) AS Rn 
FROM @t
)
SELECT * 
FROM cte 
WHERE Rn > 1; -- possible dups

-- Hunchback's suggestion
;WITH cte AS (
SELECT 
    *, ROW_NUMBER() OVER (
    PARTITION BY CASE WHEN Dup1 <= Dup2 THEN Dup1 ELSE Dup2 END, CASE WHEN Dup1 <= Dup2 THEN Dup2 ELSE Dup1 END
    ORDER BY Id) AS Rn 
FROM @t
)
SELECT * 
FROM cte 
WHERE Rn > 1; -- possible dups
GO

September 1st, 2015 2:51pm

Yes, good catch. I don't know what I was thinking. I originally thought of UNPIVOT but didn't pursue this idea to the final solution.

Free Windows Admin Tool Kit Click here and download it now

September 1st, 2015 3:00pm

I revised my suggestion one more time with the simple math. If this is wrong again, I am not smarter than a 3rd grader.

DECLARE @t table (Id int IDENTITY(1,1), Dup1 int, Dup2 int)

INSERT INTO @t (Dup1, Dup2) 
VALUES 
    (20, 1),
    (5, 4),
    (4, 5),
    (10, 2),
    (2, 10),
    (2147483640, 2147483647),
(2147483647, 2147483640),
(2147483640, -2147483647),
(-2147483647, 2147483640);

-- Naomi's suggestion
;WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Dup1 + CAST(Dup2 AS bigint), abs(Dup1-cast(Dup2 as bigint))  ORDER BY Id) AS Rn 
FROM @t
)
SELECT * 
FROM cte 
WHERE Rn > 1; -- possible dups

-- Hunchback's suggestion
;WITH cte AS (
SELECT 
    *, ROW_NUMBER() OVER (
    PARTITION BY CASE WHEN Dup1 <= Dup2 THEN Dup1 ELSE Dup2 END, CASE WHEN Dup1 <= Dup2 THEN Dup2 ELSE Dup1 END
    ORDER BY Id) AS Rn 
FROM @t
)
SELECT * 
FROM cte 
WHERE Rn > 1; -- possible dups
GO

September 1st, 2015 3:05pm

I have tried Hunchback's suggestion. The type of data I have, it doesn't work.

Naomi's first and second - both works fine for me. Thanks.

Free Windows Admin Tool Kit Click here and download it now

September 1st, 2015 4:17pm

Strange. Can you please share with us the type of the data you have in Dup1 and Dup2 columns? I think my original solution should not work and Alejandro demonstrated the problem in his sample.

September 1st, 2015 4:47pm

This topic is archived. No further replies will be accepted.