Flat Files - SSIS Handy Hints

After working with flat files (especially text and CSV) for some time in my current role I have decided to come up with this little discussion which I believe may benefit peers who might not yet have come across some of the issues that I have raised in this piece. This piece is just a summary of what I have thought would be useful hints for anyone to consider when they are working with CSV or text files. Everything in this writing is purely from my experience and I am discussing it at SSIS level. I believe most of the actions can be applied across in SSMS when dealing with flat files there.

Flat files as destination file

Exporting data to flat file destination is relatively easy and straight forward. There arent as many issues experienced here as they are when one imports data from flat files. However, whatever one does here hugely impacts on how easy and straightforward importing data down the line from your flat file that you create at this stage will be. There are a few things to note. When you open your Flat File Destination component, of the options that you will see listed below is a discussion of some of them. I have found it useful to take the actions which I have mentioned below.

  1. Column names in the first row If you want the column names to be in the first row on your file make sure that you always check the box with that option. SSIS does not do that for you by default.
  2. Column delimiter I always prefer to use Tab {t}. The default delimiter is Comma {,}. The problem that I have found with use of comma delimiters is that if the values have inherent commas within them the system does not always get it right in determining where the column ends. As a result you may end up getting some rows with shifted and misplaced values from their original columns owing to the wrong column allocation by the system.
  3. AlwaysCheckForRowDelimiters this is a Flat File Connection Manager property. The default setting of this property is True. At one time I found myself having to use this after I faced a huge problem with breaking and misplacement of rows in the dataset that I was dealing with. The problem emanated from the values in one of the columns. The offending column had values of varchar datatype which were presented in the form of paragraphs with all sorts of special characters within them, e.g.

This is ++, an example of what I mean the characters ;

in the dataset which gave me: nearly 100% ++ headaches looked like {well}; this piece

OF: example??

You can see from the above italicised dummy value example what I mean. Values such as that make the system to prematurely break the rows. I dont know why but the somehow painful experience that I had about this led me to the conclusion that I should not leave the system to auto-decide where the row ends. As such, when I changed the property AlwaysCheckForRowDelimiters from True to False, along with the recommendations mentioned in items 1 and 2 above, breaking and misplacement of rows was solved. By breaking I mean -  you will find one row in a table being broken into two or three separate rows in the flat file. This is carried over to the new table where that flat will is loaded.

Addendum

There is an additional option which I have found to work even better if one is experiencing issues with breaking of rows due to values being in the paragraph format as illustrated above. The option for paragraphed values which I explained earlier works, but not always as I have realised. If that option does not work, this is what you are supposed to do

  • When you SELECT the data for export from your table, make your SELECT statement to look something like this

SELECT ColumnName1,

              ColumnName2,

              ColumnName3,

              REPLACE(REPLACE(OffendingColumnNameName,CHAR(10),''),CHAR(13),'') AS OffendingColumnNameName,

              ColumnName4

FROM MyTableName

The REPLACE function gets rid of the breaks on your values. That is, it gets rid of the paragraphs in the values.

  • I would suggest use of double dagger column delimiters if using this approach.


Text or CSV file?? In my experience going with the text file is always efficient. Besides, some of the things recommended above only work in text file (I suppose so. I stand to be corrected on this). An example of this is column delimiters. Item 2 above recommends use of Tab {t} column delimiter whereas in CSV, as the name suggests, the delimiters are commas.

Flat files as source file

In my experience, many headaches of working with flat files are seen at importing data from flat files. A few examples of the headaches that Im talking about are things such as,

  1. Datatypes and datatype length, if using string
  2. Shifting and misplacement of column values
  3. Broken rows, with some pseudo-rows appearing in your import file
  4. Double quotation marks in your values

Below I will address some of the common things which I have personally experienced and hope will be useful to other people. When you open your Flat File Source component, of the options that you will see listed below is a discussion of some of them. I have found it useful to take the actions which I have mentioned below.

  1. Retain null values from the source as null values in the data flow this option comes unchecked by default. From the time I noticed the importance of putting a check mark in it, I always make sure that I check it. It was after some of my rows in the destination table were coming up with shifted and misplaced column values. By shifted and misplaced column values I mean certain values appearing under columns where you do not expect them, by so doing showing that purely the value has been moved from its original column to another column where it does not belong.
  2. Text qualifier the default entry here is <none>. I have found that it is always handy to insert double quotes here (). This will eliminate any double quotes which the system may have included at the time when the flat file was created. This happens when the values in question have commas as part of the characters in them.
  3. Column delimiter this solely depends on the column delimiter which was specified at the time when the flat file was created. The system default is Comma {,}. Please note that if the delimiter specified here is different from the one in your flat file the system will throw up an error with a message like An error occurred while skipping data rows.
  4. Column names in the first data row if you want the first row to be column names put a check mark on this option.

Datatypes and datatypes length

By default when you import a flat file your datatypes for all the columns come up as varchar (50) in SSIS. More often than not if you leave this default setup your package will fail when you run it. This is because some of the values in some of your columns will be more than 50 characters, the default length. The resulting error will be a truncation error. I have found two ways of dealing with this.

  1. Advanced This is an option found on the Flat File Source Editor. Once this option is selected on your Flat File Source Editor you will be presented with a list of columns from your flat file. To determine your datatypes and length there are two possible things that you can do at this stage.
    1. Go column by column going column by column you can manually input your desired datatypes and lengths on the Flat File Source Editor through the Advanced option.
    2. Suggest types this is another option under Advanced selection. What this option does is suggest datatypes and lengths for you based on the sample data amount that you mention in the pop-up dialog box. I have noticed that while this is a handy functionality, the problem with it is that if some of the values from the non-sampled data have lengths bigger than what the system would have suggested the package will fail with a truncation error.
  2. View code this is viewing of XML code. If for example you want all your columns to be of 255 characters length in your landing staging table
    1. Go to your package name, right click on it and select the option View code from the list presented to you. XML code will then come up.
    2. Hit Ctrl + F to get a Find and Replace: window. On Find What type in DTS:MaximumWidth="50" and on Replace with: type in DTS:MaximumWidth="255". Make sure that under Look in the selection is Current Document.
    3. Click Replace All and all your default column lengths of 50 characters will be changed to 255 characters.
    4. Once done, save the changes. Close the XML code page. Go to your package GUI designer. You will find that the Flat File Source component at this point will be highlighted with a yellow warning triangle. This is because the metadata definition has changed. Double click the Flat File Source component and then click on Ok. The warning will disappear and you will be set to pull and load your data to your staging database with all columns being varchar (255). If you need to change any columns to specific data types you can either use Data Conversion Component or Derived Column component for that purpose, OR you can use both components depending on the data types that you will converting to.

Dynamic Flat File Name and Date

Please see this blog http://www.bidn.com/blogs/mikedavis/ssis/153/using-expression-ssis-to-save-a-file-with-file-name-and-date

There is so much to flat files to be discussed in one piece.

Any comments plus additions (and subtractions too) to this piece are we

April 4th, 2014 4:17pm

Thank you SSISJoost. I have created a WIKI about this subject on TechNet Guru Contributions for April 2014 page in line with your suggestion.

Kind regards,

April 6th, 2014 1:28pm

Hi Mpumelelo,

Thank you for your useful sharing. I believe other community members who face with the similar issue will benefit from this post.

Regards,

Free Windows Admin Tool Kit Click here and download it now
April 11th, 2014 7:56am

Hello,

I have Text File with Audit row which is the last record in the file.

Since this row does not have all column delimiter , it gets skipped in 2008 R2.

Trying to migrate project on 2012 server, and 2012 picks up audit row and SSIS package fails.

this does not happen in 2008 and same file throws no error.

I tried making AlwaysCheckForRowDelimiters to false, but problem still continues since audit row still gets picked up.

please advice why AlwaysCheckForRowDelimiters propert not working for me.

 

August 17th, 2015 3:35pm

Hi Digp,

I'm not sure if I understand your question properly. Are you importing data from a text file to a database or you are exporting data from a database to a text file?

Thanks,

Free Windows Admin Tool Kit Click here and download it now
August 17th, 2015 3:55pm

Hello Mpumelelo,

I am Importing Data from Text File to sql server 2012 database. Column Delimiter is ~ ,and row delimiter is CRLF.

Project already developed in 2008R2. We are migrating to 2012.

There is a Audit row at the end of the file which does not have all column delimiter(~).

in 2008R2 this extra audit row does not get picked up by flat file connection manager, which is expected and has no problem because any ways we are not supposed to load that row into database table.

but in 2012, this audit row gets picked up by flat file connection manager.

I set 'AlwaysCheckForRowDelimiters' property to false so that 2012 (which gives preference to row delimiter ) can behave as 2008 (gives preference to column delimiter) , and extra audit row(with incomplete column delimiters) can be rejected, but it not working,and lat file connection manager picks up that extra row with incomplete column delimiters.

Thanks..

August 18th, 2015 11:18am

If you are doing a simple export and import, try using Raw File Destination from your legacy database and use Raw File Source in your new SQL Server 2012 database. That has worked for me before when I get terrible headaches with flat file processes. Even if you have to do some data transformations before the final destination table, you can import data to a landing table in SQL Server 2012 and then use that as the source. Hope that helps.


Free Windows Admin Tool Kit Click here and download it now
August 18th, 2015 11:55am

Thanks for reply.

But SSIS package flat file source component  itself gives an error when it reads the text file though connection manager.

what happens if  suppose column 3 has column length of 3 char, but since last record is incomplete and after column 2 it doesn't have any data but empty space of columns, the length gets increased for column 3 , and SSIS package flat file source component  failed on truncation error.

I am just trying to find out a way, where source flat file connection manager can behave as it does 2008,

so that extra row gets skipped in 2012.

exaple of data as below

col1~col2~col3~colu4

abc~xyz~pqr~lmn

abc~xyz~pqr~lmn

audit~11~        (space after last ~)

col3 data length 3, but because column 3 in last row has empty space that makes it to 6 char,

package failed.   

August 18th, 2015 12:30pm

Is the line audit~11~        (space after last ~), coming directly from the same table where the rest of the data is? If it is, the system should be able to pick that up. This can be achieved by adding a Text qualifier to your destination Flat File Connection Editor on SQL Server 2008. You can use double quotes " as your Text qualifier. Try that and see if it works.

Free Windows Admin Tool Kit Click here and download it now
August 18th, 2015 12:54pm

I am trying to skip that row to be picked up by source flat file connection manager.(2008 it works fine). 2012 Picks up even it is incomplete. looking for the way if that row can also be skipped in 2012. 
August 18th, 2015 1:44pm

If the audit row comes from the same table as the rest of the data, use a Conditional Split component on SQL Server 2008 to isolate the row. This will ensure that the audit row is not passed on to the flat file.

 

Using your example, if col1 has data value audit (for audit row), then you can use the following expression on the Conditional Split component,

 

[col1]=="audit"

 

Hope this

Free Windows Admin Tool Kit Click here and download it now
August 18th, 2015 2:39pm

Hello Mpumelelo,

2008 already skipped that row(source flat file connection manager takes care of that and I check in preview button and it does not show up there,also does not get populated in destination table.).

expecting for some kind of setting in 2012 at source flat file connection manager that could also skip that audit row.

thanks..

August 18th, 2015 3:04pm


I dont know if I fully understand the scenario that you are trying to present. On the Flat File Connection Manager Editor, on Preview, there is an option to Skip rows that can be used. But I think in your case it can be fiddly. I still feel that as long as the failure does not happen on the Flat File Source component, you can easily get rid of the audit row by using Conditional Split component. I dont know of any other option that you can use at flat file source to achieve your desired results. The other option of course is to filter the row at 2008 level by either using WHERE clause in SQL statement as you pull the data from the source table or Conditional Split in the pipeline just before flat file destin

Free Windows Admin Tool Kit Click here and download it now
August 18th, 2015 3:51pm

By the looks of it the "Always look for row delimiter" is not meant to replicate what 2008 did but rather decide which delimiter gets preference. If your audit row wasn't the last row, this would mean whether it should give you the truncated row (preference to row delimiter) or consume the next row as part of this row (preference to column delimiter). You effectively had a solution based on a flaw in 2008

Is there any way you can identify the audit row? That way you can filter it out via a conditional split

August 18th, 2015 8:15pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics