Refine column which has dirty data
Hi Gurus,
I have a column in a table which is related to date,but it also has some dirty data like this:
col1
NULL
2011\09\23 rev.018
2011\09\26 rev.019
2011\09\25 rev.005
\\ rev.
How can I refine this data using SSIS transformation. ??I want something like
col1
NULL
2011\09\23
2011\09\26
2011\09\25
null.
Please help me with this.........
October 28th, 2011 12:04pm
you can use Script Component Transform as Transformation
and write code to apply Regular Expression to find out is the string match incomming values or not,
you can also use third party tools like :
http://www.sqlis.com/post/Regular-Expression-Transformation.aspx
this expression will find what you are looking for:
\w{4}\\\w{2}\\\w{2}http://www.rad.pasfu.com
Free Windows Admin Tool Kit Click here and download it now
October 28th, 2011 1:30pm
I was wondering if it possible with Derived column transformation?????'
I don't know .Net. If you can, could you please provide me the expression code for derived col transformation.?????
October 28th, 2011 1:34pm
OK, add a script component after the source,
use it as Transformation
and set language as C#, set col1 as input column, then in input and output columns tab, add an output column of type DT_STR or DT_WSTR and then edit script,name this output column name as OutputCleansed
write this script in Input0_ProcessInputRow;
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
Regex reg = new Regex(@"\w{4}\\\w{2}\\\w{2}");
if (!Row.col1_IsNull)
{
if (reg.IsMatch(Row.col1))
{
Row.OutputCleansed = reg.Match(Row.col1).Groups[0].Value;
}
else
Row.OutputCleansed_IsNull = true;
}
}
http://www.rad.pasfu.com
Free Windows Admin Tool Kit Click here and download it now
October 28th, 2011 2:07pm
you can read whole story here:
http://www.rad.pasfu.com/index.php?/archives/53-Regular-Expression-in-SSIS.htmlhttp://www.rad.pasfu.com
October 28th, 2011 2:51pm
Hi Reza,,
That is awesome explanation!! thank you. But The problem is when I am trying to load the data has col datatype as date. And it is giving me error. How can I solve this????
this is the error im getting.
[OLE DB Destination [10756]] Error: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005.
An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "Invalid character value for cast specification".
[OLE DB Destination [10756]] Error: There was an error with input column "outputcleansed" (10848) on input "OLE DB Destination Input" (10769). The column status returned was: "The value could not be converted because of a potential loss of data.".
Free Windows Admin Tool Kit Click here and download it now
October 28th, 2011 3:31pm
so you can convert it to DT_Date or DT_DBTimeStamp in a Data Conversion Transformationhttp://www.rad.pasfu.com
October 28th, 2011 3:33pm
But When I put Dataconversion transformation and set the datatype as DT_Date and still gives me the error??
Please help me with this..
Free Windows Admin Tool Kit Click here and download it now
October 28th, 2011 3:46pm
OK, change the output column of script transform data type to DT_Date
and change script as below:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
Regex reg = new Regex(@"(?<year>\w{4})\\(?<month>\w{2})\\(?<day>\w{2})");
if (Row.col1_IsNull || string.IsNullOrEmpty(Row.col1.Trim()))
Row.OutputCleansed_IsNull = true;
else
if (reg.IsMatch(Row.col1))
{
Row.OutputCleansed = new DateTime(
int.Parse(reg.Match(Row.col1).Groups["year"].Value),
int.Parse(reg.Match(Row.col1).Groups["month"].Value),
int.Parse(reg.Match(Row.col1).Groups["day"].Value));
Row.outputString = reg.Match(Row.col1).Groups[0].Value;
}
else
Row.OutputCleansed_IsNull = true;
}
http://www.rad.pasfu.com
October 28th, 2011 4:05pm
I am getting some redmark at the
Row.OutputCleansed = new DateTime. do I have any class or anythinglike that?
And to just confirm when you said Row.outputString , did u meant Row.outputcleansed????
Free Windows Admin Tool Kit Click here and download it now
October 28th, 2011 4:15pm
yea I meant Row.OutputCleansed,
could you take an screenshot of error in the script editor with your code and put herehttp://www.rad.pasfu.com
October 28th, 2011 4:34pm
This is the code im using....and I am getting error like cannot convert implicitly....
Also, please note that I only want date not datetime...
using
System;
using
System.Data;
using
Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using
Microsoft.SqlServer.Dts.Runtime.Wrapper;
using
System.Text.RegularExpressions;
[Microsoft.SqlServer.Dts.Pipeline.
SSISScriptComponentEntryPointAttribute]
public
class
ScriptMain :
UserComponent
{
public
override
void PreExecute()
{
base.PreExecute();
/*
Add your code here for preprocessing or remove if not needed
*/
}
public
override
void PostExecute()
{
base.PostExecute();
/*
Add your code here for postprocessing or remove if not needed
You can set read/write variables here, for example:
Variables.MyIntVar = 100
*/
}
public
override
void Input0_ProcessInputRow(Input0Buffer
Row)
{
Regex reg =
new
Regex(@"(?<year>\w{4})\\(?<month>\w{2})\\(?<day>\w{2})");
if (Row.col1_IsNull ||
string.IsNullOrEmpty(Row.col1.Trim()))
Row.outputcleansed_IsNull =
true;
else
if (reg.IsMatch(Row.col1))
{
Row.outputcleansed =
new
DateTime(
int.Parse(reg.Match(Row.col1).Groups["year"].Value),
int.Parse(reg.Match(Row.col1).Groups["month"].Value),
int.Parse(reg.Match(Row.col1).Groups["day"].Value));
Row.outputcleansed = reg.Match(Row.col1).Groups[0].Value;
}
else
Row.outputcleansed_IsNull =
true;
}
}
Please let me know ...Thank you..
Free Windows Admin Tool Kit Click here and download it now
October 28th, 2011 4:59pm
did you changed data type of OutputCleansed column in the input and output columns tab of script component editor to DT_Date?http://www.rad.pasfu.com
October 28th, 2011 5:06pm
Yes I tried changing . and also keeping DT_STR...nothing worked out...
Free Windows Admin Tool Kit Click here and download it now
October 28th, 2011 5:16pm
you probably did something wrong,
could you send your package to me to check it? mail it to a dot raad dot g at gmail dot comhttp://www.rad.pasfu.com
October 28th, 2011 5:28pm
you probably did something wrong,
could you send your package to me to check it? mail it to a dot raad dot g at gmail dot comhttp://www.rad.pasfu.com
Free Windows Admin Tool Kit Click here and download it now
October 29th, 2011 12:09am
Hi Reza,
I have send you the structure of src and dest. Did you able to check it>> Please let me know...
Thanks for you time!!!!!!!
November 4th, 2011 7:56am
what do you want the o/p exactly let me know first.
if you want only date as of of each row then it can be done with the help of a derived column t/r and use substring function over there then you can get the result...
substring('column name',0,10)kalishavali
Free Windows Admin Tool Kit Click here and download it now
November 4th, 2011 9:01am
Hi Kalishavali,
Actually I refined that column by eliminating other than date. by script provided by Reza. Now my destination datatype was Date and the refined col datatype is DT_STR. So, when mapping it is saying that it cannot be converted or It
is not possible because you may lose more data something like that.
I was wondering is there any solution for this?
November 4th, 2011 7:56pm