can I group records using fuzzy grouping transformation?
I want to run a fuzzy grouping against a table with let's say 1 million rows. The challenge is that the table is more of a fact table then a reference table. The table has a column called "person_id" and attributes such as "name" and "visit_date".

Here is some sample data:

row_id person_idnamevisit_date
11John Smith01/12/2008
21Johnny SMith 02/12/2008
31john m smith 04/12/2008
41Jim Bob 04/12/2008
51Jack Frost04/12/2008
62John Smith01/12/2008
72Jack Frost01/12/2008

What I would like todo is run a fuzzy group transform againt the entire table (1 million rows) but group it by the "person_id" column. So in the example above I would like to perform a fuzzy grouping against two groups, person_id's 1 and 2

Person_id group 1 would contain row_ids 1 to 5 and I would expect to flag row_id's 1,2, and 3 with a high score (likely duplicates) and row_id's 4 and 5 with a low score (likely not duplicates).

Person_id group2 would contain row_ids6 and 7 and should be flagged with a low score (not likely duplicates)

Note: row_id's 5 and 7 should not be compared against each other since they belong to seperate person_id's.

Any suggestions would be greatly appreciated.
December 28th, 2008 5:35pm

Hi Robber,

I am not entirely sure about your requirement, however I would do the following:

1. Take a distinct of Person_ID & Person_Name to a temp table (will eliminate ranking of row 5 & 7)

2. In the above table, add a column with concatenation of Person_ID & Name.

3. Do a Fuzzy grouping on the new column.

This should give you what you are looking for. Let me know if you see any issue with the above.

Free Windows Admin Tool Kit Click here and download it now
August 11th, 2015 1:36am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics