Remove Duplicates in string sql AND TAKE THE CORRECT ONE

Good Evening how can i perform this task with ssis  OR TRANSACT SQL? 

I HAVE THESE ROWS WITH THE NEXT DATA, I want to take just the valid one, BUT I HAVE A LOT OF COMBINATIONS AS following names, it can be animals, things or personal names

GABRIEL OBANDO --CORRECT
GABRIEL OVANDO

Gavriel OVANDO

gAbriel OBANDO

GABRIE OBANDO

Gabri OBONDA

MANAGUA --CORRECT

NANAGUA

NAMAGUA

Thanks for the reply.

April 23rd, 2015 7:45pm

well, do you need  to have a separate list\master list with valid ones. if so, then you use IN clause that list from this list

or how would anyone know which ones are valid and which ones are not?

Free Windows Admin Tool Kit Click here and download it now
April 23rd, 2015 7:48pm

I agree to Stan210, without knowing any criteria it will be nearly impossible to discover correct combination.

Chinese17 - Do you have any master list or sort of some dictionary table that can be referred to derive what you need?

April 24th, 2015 1:08am

Well it is not practically possible to identify the correct Name until you have a reference table or you have or knowledge base in Data Quality services.

The only code which I can provide you is

SELECT DISTINCT NAME 
FROM dbo.Test
WHERE NAME = UPPER(NAME) COLLATE SQL_Latin1_General_CP1_CS_AS

This code will check for the upper case name and select them. But it doesn't guarantee finding correct name when spelling is wrong. 

If you have any table from where we can validate data then you can use below code

SELECT DISTINCT Name
FROM dbo.NameTable
WHERE Name IN (SELECT Name FROM dbo.OtherNameTable)

Free Windows Admin Tool Kit Click here and download it now
April 24th, 2015 1:46am

Well it is not practically possible to identify the correct Name until you have a reference table or you have or knowledge base in Data Quality services.

The only code which I can provide you is

SELECT DISTINCT NAME 
FROM dbo.Test
WHERE NAME = UPPER(NAME) COLLATE SQL_Latin1_General_CP1_CS_AS

This code will check for the upper case name and select them. But it doesn't guarantee finding correct name when spelling is wrong. 

If you have any table from where we can validate data then you can use below code

SELECT DISTINCT Name
FROM dbo.NameTable
WHERE Name IN (SELECT Name FROM dbo.OtherNameTable)

April 24th, 2015 5:44am

Well it is not practically possible to identify the correct Name until you have a reference table or you have or knowledge base in Data Quality services.

The only code which I can provide you is

SELECT DISTINCT NAME 
FROM dbo.Test
WHERE NAME = UPPER(NAME) COLLATE SQL_Latin1_General_CP1_CS_AS

This code will check for the upper case name and select them. But it doesn't guarantee finding correct name when spelling is wrong. 

If you have any table from where we can validate data then you can use below code

SELECT DISTINCT Name
FROM dbo.NameTable
WHERE Name IN (SELECT Name FROM dbo.OtherNameTable)

  • Proposed as answer by AB82Moderator Friday, April 24, 2015 8:50 PM
Free Windows Admin Tool Kit Click here and download it now
April 24th, 2015 5:44am

Once you have the reference data, check this link how we can do it in SSIS

http://www.codeproject.com/Tips/528243/SSIS-Fuzzy-lookup-for-cleaning-dirty-data

April 24th, 2015 2:32pm

Thank you so much. I've solved the problem using a reference table.

Free Windows Admin Tool Kit Click here and download it now
April 24th, 2015 5:56pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics