CSV importing unwanted columns
Hi all,
I wonder if anyone can help with a strange issue that's occuring when I try to import some csv files via SSIS.
My flat file connection manager details are as follows:
Format: Delemited
Text Qualifier: <none>
Header row delimiter: Comma {,}
Header rows to skip: 0
Column names ARE NOT in the first data row.
The problem is that if I have specified 3 columns, as below:
Column1 ColumnDelimiter: Comma{,}
Column2 ColumnDelimiter: Comma{,}
Column3 ColumnDelimiter: {CR}{LF}
...However, if in the csv file there is data populated in the first 4 columns (i.e. my defined columns[1-3] in addition to column4 which I have not specified) then when I look into the Preview tab in SSIS, I get the value followed by a comma (',') for the
for Column3 field. What's worse is that if I have lots more rows then it populate all those other entries in Column3 with a comma (',') also.
Has anyone experience this before, and more importantly is there a fix to it? I only want SSIS to read those column values that I have specified in the file connection manager. I don't want to read/import anything after my last defined column.
Any help would be greatfully appreciated.
April 13th, 2011 6:45am
in the connection manager you need to define the fourth column anyway because there is , between 3 and 4.
however, in the data source you can untick the cloumn 4 to just import 3 columns.
Free Windows Admin Tool Kit Click here and download it now
April 13th, 2011 7:11am
I think when you define a connection to the flat file, it has to read the whole file - you can't (in the connection) select which columns to read. However, in the columns tab of the connectino manager you can choose which columns to pass these
on to the next task.every day is a school day
My Blog
April 13th, 2011 7:13am
Hi, thanks for the reply.
I am not sure I understand your answer, maybe I didn't explain the question properly.
But for clarity in the SSIS if I have 3 column names defined that I want to import; EmployeeID, FirstName & Surname....then when a user fills out the csv to be imported I am expecting;
AA1 Joe Bloggs
AA2 Sally Anne
AA3 Tim Smith
If for whatever reason they decide to add more data to the cell next to surname, then I want to ignore that, i.e.
AA1 Joe Bloggs Blaah (I want to ignore the last column cell and just import the first 3)
AA2 Sally Anne Blaah (I want to ignore the last column cell and just import the first 3)
AA3 Tim Smith
AA4 Fae
In SSIS in the columns tab for the above data it is showing the following;
EmployeeID FirstName Surname
AA1 Joe Bloggs,
AA2 Sally Anne,
AA3 Tim Smith,
AA4 Fae ,
Does that help explain my issue any clearer? The only other information i failed to mention is that the outputcolumnwidth for each field is 255.
Free Windows Admin Tool Kit Click here and download it now
April 13th, 2011 7:46am
Hi, thanks for your reply. I was hoping that I could use maybe a script task component to dynamically check the number of columns and bring them in that way.
April 13th, 2011 7:48am
I think I got understand a little more what your situation is. your problem is you are not in control for the txt data source file.
If you can guarantee that your first 3 column will be always there then there is a simple solution for this
1. create your flat file connection and set the format to rigged right, which will import your file as one column only
2. use the derived column transformation to get first column, second column and third one.
3. this can ensure you only import 3 fields.
an example of derived column can be:
substring([Column 0], 1,(FINDSTRING([Column 0],",",1)-1) as col1
substring([Column 0], (FINDSTRING([Column 0],",",1)+1, (FINDSTRING([Column 0],",",2)-1) as col2
etc
Free Windows Admin Tool Kit Click here and download it now
April 13th, 2011 10:23am
Take a look at this:
http://microsoft-ssis.blogspot.com/2011/02/script-component-as-source.html
It's about a CSV file where not all column are filled (or extra column are added)Please mark the post as answered if it answers your question | My SSIS Blog:
http://microsoft-ssis.blogspot.com
April 13th, 2011 10:32am
Hi Steven. Thanks for your reply. I've come across that solution and it seems good, however I can't gurantee that the columns that I have defined will always contain a value. If that is the case then I have some business logic later on in SQL Server which
flags those records as validation errors. I simply want to erradicate SSIS giving me extra characters in my output columns if a user inadverdently adds extra data to a column that has not been defined (this could be quite easy to do in my case epecially seeing
as the csv files do not contain header rows).
Free Windows Admin Tool Kit Click here and download it now
April 13th, 2011 11:36am
Have you any more suggestions?
April 13th, 2011 12:11pm
Hi SSISJoost,
Thanks for the link. Is there not anyway that I can do it but still using the flat file connection manager?
Free Windows Admin Tool Kit Click here and download it now
April 14th, 2011 4:36am
Hi SSISJoost,
Thanks for the link. Is there not anyway that I can do it but still using the flat file connection manager?
You propably can use the connectionstring of that connection manager, but this script isn't using other properties of the connection manager (such as columnnames).
The columns are defined in the Script Component instead of the connection manager:
But feel free to try different approaches...
Please mark the post as answered if it answers your question | My SSIS Blog:
http://microsoft-ssis.blogspot.com
April 14th, 2011 4:46am
Hi SSISJoost,
Thanks for the link. Is there not anyway that I can do it but still using the flat file connection manager?
And if you don't like scripting C# or VB.net. You can always read all columns as one big string and split it in smaller pieces/columns using a derived column.
You will need the
FINDSTRING expression and the
SUBSTRING expression to accomplish that.
An other options is to ask for correct csv files from your source. These are the basic csv rules:
http://en.wikipedia.org/wiki/Comma-separated_values#Basic_rules
Please mark the post as answered if it answers your question | My SSIS Blog:
http://microsoft-ssis.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
April 14th, 2011 4:53am
Good point SSISJoost. I don't mind the scripting, that's not a problem. Just worried about how I would handle errors/occassions in which the data itself contains commas (,)!!!! Any ideas on how I can handle that, or is that a seperate question in another
post?! CSV's...aaaaaaaaaaaaaaaaahhhhhhh!!
April 14th, 2011 5:02am
Good point SSISJoost. I don't mind the scripting, that's not a problem. Just worried about how I would handle errors/occassions in which the data itself contains commas (,)!!!! Any ideas on how I can handle that, or is that a seperate question in
another post?! CSV's...aaaaaaaaaaaaaaaaahhhhhhh!!
That could be quite complicated... but the regular flat file connection manager has the same problems. If you add extra commas to your data you will need to use qualifiers otherwise you will get errors... and then you will have
to reckon with qualifiers within your text....
So incorrect supplied
CSV can be quite a pain in the *ss :-(
Please mark the post as answered if it answers your question | My SSIS Blog:
http://microsoft-ssis.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
April 14th, 2011 5:15am
I got around this by adding a script task component before my dataflow task was called. Inside the script task component i had some like the below.
Public Sub Main()
Dim sourceFilename As String = Dts.Variables("RunTime_FileName").Value.ToString
Dim sColumnSeparator As String = ","
Dim sColumnCount As Integer = 13
Dim fileReader As StreamReader = New StreamReader(sourceFilename)
Dim line As String
Dim DataRow() As String
'Start with success status. Set to failure if required.
Dts.TaskResult = Dts.Results.Success
While (Not fileReader.EndOfStream)
line = fileReader.ReadLine()
DataRow = line.Split(sColumnSeparator.ToCharArray())
If DataRow.Length > sColumnCount Then
Dts.TaskResult = Dts.Results.Failure
End If
End While
fileReader.Close()
End Sub
Seems to work how I intend, i.e. if there are more columns than I have already defined then fail, in which I will handle the failure with a suitable message to the user. The variable sColumnCount will be changed to be read via a config file.
Hope it helps, definately sorted me out!
April 14th, 2011 7:49am