CSV importing unwanted columns
Hi all, I wonder if anyone can help with a strange issue that's occuring when I try to import some csv files via SSIS. My flat file connection manager details are as follows: Format: Delemited Text Qualifier: <none> Header row delimiter: Comma {,} Header rows to skip: 0 Column names ARE NOT in the first data row. The problem is that if I have specified 3 columns, as below: Column1 ColumnDelimiter: Comma{,} Column2 ColumnDelimiter: Comma{,} Column3 ColumnDelimiter: {CR}{LF} ...However, if in the csv file there is data populated in the first 4 columns (i.e. my defined columns[1-3] in addition to column4 which I have not specified) then when I look into the Preview tab in SSIS, I get the value followed by a comma (',') for the for Column3 field. What's worse is that if I have lots more rows then it populate all those other entries in Column3 with a comma (',') also. Has anyone experience this before, and more importantly is there a fix to it? I only want SSIS to read those column values that I have specified in the file connection manager. I don't want to read/import anything after my last defined column. Any help would be greatfully appreciated.
April 13th, 2011 6:45am

in the connection manager you need to define the fourth column anyway because there is , between 3 and 4. however, in the data source you can untick the cloumn 4 to just import 3 columns.
Free Windows Admin Tool Kit Click here and download it now
April 13th, 2011 7:11am

I think when you define a connection to the flat file, it has to read the whole file - you can't (in the connection) select which columns to read. However, in the columns tab of the connectino manager you can choose which columns to pass these on to the next task.every day is a school day My Blog
April 13th, 2011 7:13am

Hi, thanks for the reply. I am not sure I understand your answer, maybe I didn't explain the question properly. But for clarity in the SSIS if I have 3 column names defined that I want to import; EmployeeID, FirstName & Surname....then when a user fills out the csv to be imported I am expecting; AA1 Joe Bloggs AA2 Sally Anne AA3 Tim Smith If for whatever reason they decide to add more data to the cell next to surname, then I want to ignore that, i.e. AA1 Joe Bloggs Blaah (I want to ignore the last column cell and just import the first 3) AA2 Sally Anne Blaah (I want to ignore the last column cell and just import the first 3) AA3 Tim Smith AA4 Fae In SSIS in the columns tab for the above data it is showing the following; EmployeeID FirstName Surname AA1 Joe Bloggs, AA2 Sally Anne, AA3 Tim Smith, AA4 Fae , Does that help explain my issue any clearer? The only other information i failed to mention is that the outputcolumnwidth for each field is 255.
Free Windows Admin Tool Kit Click here and download it now
April 13th, 2011 7:46am

Hi, thanks for your reply. I was hoping that I could use maybe a script task component to dynamically check the number of columns and bring them in that way.
April 13th, 2011 7:48am

I think I got understand a little more what your situation is. your problem is you are not in control for the txt data source file. If you can guarantee that your first 3 column will be always there then there is a simple solution for this 1. create your flat file connection and set the format to rigged right, which will import your file as one column only 2. use the derived column transformation to get first column, second column and third one. 3. this can ensure you only import 3 fields. an example of derived column can be: substring([Column 0], 1,(FINDSTRING([Column 0],",",1)-1) as col1 substring([Column 0], (FINDSTRING([Column 0],",",1)+1, (FINDSTRING([Column 0],",",2)-1) as col2 etc
Free Windows Admin Tool Kit Click here and download it now
April 13th, 2011 10:23am

Take a look at this: http://microsoft-ssis.blogspot.com/2011/02/script-component-as-source.html It's about a CSV file where not all column are filled (or extra column are added)Please mark the post as answered if it answers your question | My SSIS Blog: http://microsoft-ssis.blogspot.com
April 13th, 2011 10:32am

Hi Steven. Thanks for your reply. I've come across that solution and it seems good, however I can't gurantee that the columns that I have defined will always contain a value. If that is the case then I have some business logic later on in SQL Server which flags those records as validation errors. I simply want to erradicate SSIS giving me extra characters in my output columns if a user inadverdently adds extra data to a column that has not been defined (this could be quite easy to do in my case epecially seeing as the csv files do not contain header rows).
Free Windows Admin Tool Kit Click here and download it now
April 13th, 2011 11:36am

Have you any more suggestions?
April 13th, 2011 12:11pm

Hi SSISJoost, Thanks for the link. Is there not anyway that I can do it but still using the flat file connection manager?
Free Windows Admin Tool Kit Click here and download it now
April 14th, 2011 4:36am

Hi SSISJoost, Thanks for the link. Is there not anyway that I can do it but still using the flat file connection manager? You propably can use the connectionstring of that connection manager, but this script isn't using other properties of the connection manager (such as columnnames). The columns are defined in the Script Component instead of the connection manager: But feel free to try different approaches... Please mark the post as answered if it answers your question | My SSIS Blog: http://microsoft-ssis.blogspot.com
April 14th, 2011 4:46am

Hi SSISJoost, Thanks for the link. Is there not anyway that I can do it but still using the flat file connection manager? And if you don't like scripting C# or VB.net. You can always read all columns as one big string and split it in smaller pieces/columns using a derived column. You will need the FINDSTRING expression and the SUBSTRING expression to accomplish that. An other options is to ask for correct csv files from your source. These are the basic csv rules: http://en.wikipedia.org/wiki/Comma-separated_values#Basic_rules Please mark the post as answered if it answers your question | My SSIS Blog: http://microsoft-ssis.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
April 14th, 2011 4:53am

Good point SSISJoost. I don't mind the scripting, that's not a problem. Just worried about how I would handle errors/occassions in which the data itself contains commas (,)!!!! Any ideas on how I can handle that, or is that a seperate question in another post?! CSV's...aaaaaaaaaaaaaaaaahhhhhhh!!
April 14th, 2011 5:02am

Good point SSISJoost. I don't mind the scripting, that's not a problem. Just worried about how I would handle errors/occassions in which the data itself contains commas (,)!!!! Any ideas on how I can handle that, or is that a seperate question in another post?! CSV's...aaaaaaaaaaaaaaaaahhhhhhh!! That could be quite complicated... but the regular flat file connection manager has the same problems. If you add extra commas to your data you will need to use qualifiers otherwise you will get errors... and then you will have to reckon with qualifiers within your text.... So incorrect supplied CSV can be quite a pain in the *ss :-( Please mark the post as answered if it answers your question | My SSIS Blog: http://microsoft-ssis.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
April 14th, 2011 5:15am

I got around this by adding a script task component before my dataflow task was called. Inside the script task component i had some like the below. Public Sub Main() Dim sourceFilename As String = Dts.Variables("RunTime_FileName").Value.ToString Dim sColumnSeparator As String = "," Dim sColumnCount As Integer = 13 Dim fileReader As StreamReader = New StreamReader(sourceFilename) Dim line As String Dim DataRow() As String 'Start with success status. Set to failure if required. Dts.TaskResult = Dts.Results.Success While (Not fileReader.EndOfStream) line = fileReader.ReadLine() DataRow = line.Split(sColumnSeparator.ToCharArray()) If DataRow.Length > sColumnCount Then Dts.TaskResult = Dts.Results.Failure End If End While fileReader.Close() End Sub Seems to work how I intend, i.e. if there are more columns than I have already defined then fail, in which I will handle the failure with a suitable message to the user. The variable sColumnCount will be changed to be read via a config file. Hope it helps, definately sorted me out!
April 14th, 2011 7:49am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics