Hi,
I am attempting to create an SSIS Package to do a de-duplication process with a large database of over 10 million people records.
Each person record will have to be matched with the complete DB on Name Match, DOB Match, Relationship Match, Gender Match and then we will have to save the matching records into another table - which will then be reviewed by the client to decide which are the duplicate ones and then suitable actions taken.
We have decided to go with the Fuzzy logic match to do this.
The plan is to pick up a set of 50 records one at a time, match each of them with the complete DB and then move to the next 50 records. It is also important to ignore the already processed 50 records + also ignore the duplicate matches returned.
We want to program such a package and schedule it to run at the background.
Is this possible and if so, what would be the right approach to create this?
Regards,
Ka