3rd party Data Quality Issues
Every month a 3rd party sends us a text file that we import into a custom app and finally our database. This month the import file was rejected by the app and it took us over 4 hours (including hex dumps looking for special characters) to find the bad data. When we do, we send our findings back to the 3rd party so they can correct their data. (Actually, we get over 20 such files from different 3rd parties every month. Drives us crazy). Using Sql, or otherwise, what can be done to "pre-screen" the data for errors, either in format or content. It would be nice if the “pre-screen” app could tell us precisely where the bad data is. TIA, Barkindog
November 10th, 2011 9:40pm

Hi Barkindog, SSIS has a data profiling task. Take a look if it makes sense to use in your case. I want to tell you that your definition of data quality as a "tad" broad. I feel it will be hard to cover all the bad data cases. The only thing I can tell SSIS often provides clues to where the problem is, but since you want a SQL solution - SQL's ability to screen files is not the strongest. I do though believe SSIS can and shall be used to not only pre-screen, but also load this file. Here is why: in case a file cannot be loaded (say due to unreadable characters) a proper logging can quickly point to at least the faulty record, and an email (Email Task) can be sent to notify whoever is in charge. Arthur My Blog
Free Windows Admin Tool Kit Click here and download it now
November 10th, 2011 9:59pm

Hi Barkindog, SSIS has a data profiling task. Take a look if it makes sense to use in your case. I want to tell you that your definition of data quality as a "tad" broad. I feel it will be hard to cover all the bad data cases. The only thing I can tell SSIS often provides clues to where the problem is, but since you want a SQL solution - SQL's ability to screen files is not the strongest. I do though believe SSIS can and shall be used to not only pre-screen, but also load this file. Here is why: in case a file cannot be loaded (say due to unreadable characters) a proper logging can quickly point to at least the faulty record, and an email (Email Task) can be sent to notify whoever is in charge. Arthur My Blog
November 11th, 2011 5:56am

Hi Barking, Besides Arthurz professional suggestions, please refer to the following links about the topic, http://msdn.microsoft.com/en-us/library/aa964137(v=sql.90).aspx http://www.codeproject.com/Catalogs/3068/Data-Quality-Toolkit-for-SSIS.aspx If you have any question, please feel free to ask. Thanks, Eileen
Free Windows Admin Tool Kit Click here and download it now
November 14th, 2011 3:25am

Hi Barking, Besides Arthurz professional suggestions, please refer to the following links about the topic, http://msdn.microsoft.com/en-us/library/aa964137(v=sql.90).aspx http://www.codeproject.com/Catalogs/3068/Data-Quality-Toolkit-for-SSIS.aspx If you have any question, please feel free to ask. Thanks, Eileen
November 14th, 2011 11:21am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics