Refine column which has dirty data
Hi Gurus, I have a column in a table which is related to date,but it also has some dirty data like this: col1 NULL 2011\09\23 rev.018 2011\09\26 rev.019 2011\09\25 rev.005 \\ rev. How can I refine this data using SSIS transformation. ??I want something like col1 NULL 2011\09\23 2011\09\26 2011\09\25 null. Please help me with this.........
October 28th, 2011 12:04pm

you can use Script Component Transform as Transformation and write code to apply Regular Expression to find out is the string match incomming values or not, you can also use third party tools like : http://www.sqlis.com/post/Regular-Expression-Transformation.aspx this expression will find what you are looking for: \w{4}\\\w{2}\\\w{2}http://www.rad.pasfu.com
Free Windows Admin Tool Kit Click here and download it now
October 28th, 2011 1:30pm

I was wondering if it possible with Derived column transformation?????' I don't know .Net. If you can, could you please provide me the expression code for derived col transformation.?????
October 28th, 2011 1:34pm

OK, add a script component after the source, use it as Transformation and set language as C#, set col1 as input column, then in input and output columns tab, add an output column of type DT_STR or DT_WSTR and then edit script,name this output column name as OutputCleansed write this script in Input0_ProcessInputRow; public override void Input0_ProcessInputRow(Input0Buffer Row) { Regex reg = new Regex(@"\w{4}\\\w{2}\\\w{2}"); if (!Row.col1_IsNull) { if (reg.IsMatch(Row.col1)) { Row.OutputCleansed = reg.Match(Row.col1).Groups[0].Value; } else Row.OutputCleansed_IsNull = true; } } http://www.rad.pasfu.com
Free Windows Admin Tool Kit Click here and download it now
October 28th, 2011 2:07pm

you can read whole story here: http://www.rad.pasfu.com/index.php?/archives/53-Regular-Expression-in-SSIS.htmlhttp://www.rad.pasfu.com
October 28th, 2011 2:51pm

Hi Reza,, That is awesome explanation!! thank you. But The problem is when I am trying to load the data has col datatype as date. And it is giving me error. How can I solve this???? this is the error im getting. [OLE DB Destination [10756]] Error: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005. An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "Invalid character value for cast specification". [OLE DB Destination [10756]] Error: There was an error with input column "outputcleansed" (10848) on input "OLE DB Destination Input" (10769). The column status returned was: "The value could not be converted because of a potential loss of data.".
Free Windows Admin Tool Kit Click here and download it now
October 28th, 2011 3:31pm

so you can convert it to DT_Date or DT_DBTimeStamp in a Data Conversion Transformationhttp://www.rad.pasfu.com
October 28th, 2011 3:33pm

But When I put Dataconversion transformation and set the datatype as DT_Date and still gives me the error?? Please help me with this..
Free Windows Admin Tool Kit Click here and download it now
October 28th, 2011 3:46pm

OK, change the output column of script transform data type to DT_Date and change script as below: public override void Input0_ProcessInputRow(Input0Buffer Row) { Regex reg = new Regex(@"(?<year>\w{4})\\(?<month>\w{2})\\(?<day>\w{2})"); if (Row.col1_IsNull || string.IsNullOrEmpty(Row.col1.Trim())) Row.OutputCleansed_IsNull = true; else if (reg.IsMatch(Row.col1)) { Row.OutputCleansed = new DateTime( int.Parse(reg.Match(Row.col1).Groups["year"].Value), int.Parse(reg.Match(Row.col1).Groups["month"].Value), int.Parse(reg.Match(Row.col1).Groups["day"].Value)); Row.outputString = reg.Match(Row.col1).Groups[0].Value; } else Row.OutputCleansed_IsNull = true; } http://www.rad.pasfu.com
October 28th, 2011 4:05pm

I am getting some redmark at the Row.OutputCleansed = new DateTime. do I have any class or anythinglike that? And to just confirm when you said Row.outputString , did u meant Row.outputcleansed????
Free Windows Admin Tool Kit Click here and download it now
October 28th, 2011 4:15pm

yea I meant Row.OutputCleansed, could you take an screenshot of error in the script editor with your code and put herehttp://www.rad.pasfu.com
October 28th, 2011 4:34pm

This is the code im using....and I am getting error like cannot convert implicitly.... Also, please note that I only want date not datetime... using System; using System.Data; using Microsoft.SqlServer.Dts.Pipeline.Wrapper; using Microsoft.SqlServer.Dts.Runtime.Wrapper; using System.Text.RegularExpressions; [Microsoft.SqlServer.Dts.Pipeline. SSISScriptComponentEntryPointAttribute] public class ScriptMain : UserComponent { public override void PreExecute() { base.PreExecute(); /* Add your code here for preprocessing or remove if not needed */ } public override void PostExecute() { base.PostExecute(); /* Add your code here for postprocessing or remove if not needed You can set read/write variables here, for example: Variables.MyIntVar = 100 */ } public override void Input0_ProcessInputRow(Input0Buffer Row) { Regex reg = new Regex(@"(?<year>\w{4})\\(?<month>\w{2})\\(?<day>\w{2})"); if (Row.col1_IsNull || string.IsNullOrEmpty(Row.col1.Trim())) Row.outputcleansed_IsNull = true; else if (reg.IsMatch(Row.col1)) { Row.outputcleansed = new DateTime( int.Parse(reg.Match(Row.col1).Groups["year"].Value), int.Parse(reg.Match(Row.col1).Groups["month"].Value), int.Parse(reg.Match(Row.col1).Groups["day"].Value)); Row.outputcleansed = reg.Match(Row.col1).Groups[0].Value; } else Row.outputcleansed_IsNull = true; } } Please let me know ...Thank you..
Free Windows Admin Tool Kit Click here and download it now
October 28th, 2011 4:59pm

did you changed data type of OutputCleansed column in the input and output columns tab of script component editor to DT_Date?http://www.rad.pasfu.com
October 28th, 2011 5:06pm

Yes I tried changing . and also keeping DT_STR...nothing worked out...
Free Windows Admin Tool Kit Click here and download it now
October 28th, 2011 5:16pm

you probably did something wrong, could you send your package to me to check it? mail it to a dot raad dot g at gmail dot comhttp://www.rad.pasfu.com
October 28th, 2011 5:28pm

you probably did something wrong, could you send your package to me to check it? mail it to a dot raad dot g at gmail dot comhttp://www.rad.pasfu.com
Free Windows Admin Tool Kit Click here and download it now
October 29th, 2011 12:09am

Hi Reza, I have send you the structure of src and dest. Did you able to check it>> Please let me know... Thanks for you time!!!!!!!
November 4th, 2011 7:56am

what do you want the o/p exactly let me know first. if you want only date as of of each row then it can be done with the help of a derived column t/r and use substring function over there then you can get the result... substring('column name',0,10)kalishavali
Free Windows Admin Tool Kit Click here and download it now
November 4th, 2011 9:01am

Hi Kalishavali, Actually I refined that column by eliminating other than date. by script provided by Reza. Now my destination datatype was Date and the refined col datatype is DT_STR. So, when mapping it is saying that it cannot be converted or It is not possible because you may lose more data something like that. I was wondering is there any solution for this?
November 4th, 2011 7:56pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics