SSIS Variable Record Length Text Import Not Working
I have come across a very strange behavior in SSIS. I would like to import a plain old text file of these specs: Text isnt delimited Column widths are fixed Record Lengths are variable (min record length = 127, maximum record length = 323) I have defined a table to store the data and the table layout is identical to the full record length of the file. See below DDL, table layout and sample data, below... Problem: Afterposition 127 there could be a CRLF that would signify there is no more data for any particular record that has a certain "ACTION_CODE". Otherwise, there will be data filling out the remainder of the columns to position 323. When the SSIS import runs, it actually puts the CRLF character itself into position 128 and what should be the next record's data begins right after that and truncates at position 323. Then the next record is given a new row in the table and the same repeats. I have tried for weeks to get this file to import properly. For the short records record length <=127, the data in the columns for the table should be empty. The CRLF SHOULD signify to SSIS that a new record needs to be added and to begin at the beginning of the columns for that table, not that it ignores the CRLF and continues adding the next record from the file into the SAME ROW in the table! I have tried "ragged right" but its not working. That only wants the last column of the file's record to be variable and not the entire record itself ---- which doesnt make sense to me why that wouldnt work anyway.. The CRLF should drive this process. I hate to say it but this, to me, is a bug. How can something as sophisticated as SSIS not be able to handle variable record length files?? I used to handle that stuff all the time back in my mainframe days with JCL/COBOL, etc.. I am open to any and all suggestions. I have some workarounds that I might try but I dont know why this needs a workaround. It needs to be corrected is what I am thinking. Anyone? Anthony TABLE DDL: CREATE TABLE [dbo].[EVENTS] ([NEW_FILING_NUMBER] [varchar] (12) ,[OLD_FILING_NUMBER] [varchar] (12) ,[EVENT_COUNT] [varchar] (5) ,[EVENT_SEQ_NUMBER] [varchar] (5) ,[EVENT_PAGES] [varchar] (5) ,[EVENT_DATE] [varchar] (10) ,[ACTION_SEQ_NUMBER] [varchar] (5) NULL ,[ACTION_CODE] [varchar] (2) ,[ACTION_VERBAGE] [varchar] (70) ,[ACTION_NAME] [varchar] (55) ,[ACTION_ADDRESS_LINE1] [varchar] (44) ,[ACTION_ADDRESS_LINE2] [varchar] (44) ,[ACTION_CITY] [varchar] (28) ,[ACTION_STATE] [varchar] (2) ,[ZIPCODE] [varchar] (10) ,[ACTION_COUNTRY] [varchar] (2) ,[ACTION_OLD_NAME_SEQ] [varchar] (5) ,[ACTION_NEW_NAME_SEQ] [varchar] (5) ,[ACTION_NAME_TYPE] [varchar] (1) ) ON [PRIMARY]GO RECORD LAYOUT (provided to me by the folks who create the file): EVENT.TXT -----------------------------------------------------------------------------------------------------------------COL_NAME DATA TYPE LEN START DESCRIPTION-----------------------------------------------------------------------------------------------------------------OLD_FILING_NUMBER VARCHAR(12) 12 1 Filing number assigned to eventNEW_FILING_NUMBER VARCHAR(12) 12 13 Filing assigned to original documentEVENT_COUNT VARCHAR(5) 5 25 Number of actions event consisted ofEVENT_SEQ_NUMBER VARCHAR(5) 5 30 Event sequenceEVENT_PAGES VARCHAR(5) 5 35 Number of pages event form containedEVENT_DATE VARCHAR(10) 10 40 Date event was filedACTION_SEQ_NUMBER VARCHAR(5) 5 50 Action sequenceACTION_CODE VARCHAR(2) 3 55 Action codeACTION_VERBAGE VARCHAR(70) 70 58 Description of eventACTION_NAME VARCHAR(55) 55 128 Entity name being changed or addedACTION_ADDRESS_LINE1 VARCHAR(44) 44 183 1st line of entity?s addressACTION_ADDRESS_LINE2 VARCHAR(44) 44 227 2nd line of entity?s addressACTION_CITY VARCHAR(28) 28 271 CityACTION_STATE VARCHAR(2) 2 299 StateZIPCODE VARCHAR(10) 10 301 Zip codeACTION_COUNTRY VARCHAR(2) 2 311 CountryACTION_OLD_NAME_SEQ VARCHAR(5) 5 313 Old name sequenceACTION_NEW_NAME_SEQ VARCHAR(5) 5 318 New name sequenceACTION_NAME_TYPE CHAR(1) 1 323 ActionRECORD LENGTH 323 SAMPLE DATA: (first 31 records to show enough variation in record length) 20019934086820010002868200001000020000111/01/200100001CHACHANGE ENTITY AAAAAAAAAAAAAAAAAAAA, AAA NNN MAIN AVE XXXXXX NY10131 US00002 D20018884086820010002867800001000020000111/01/200100001CHACHANGE ENTITY AAAAAAAAAAAAAAAAAAAA, AAA. NNN MAIN AVE XXXXXX NY10131 US00003 D20019039108297000006494300001000030000111/01/200100001CHACHANGE ENTITY BBBBBBBBBBBBBBBBBBBBBB NNNN XX NNTH STREET XXXXXXXXXX XXXXX NY10131 US00003 S20019038888297000006494300001000030000111/01/200100001CHACHANGE ENTITY CCCCCCCCCCCCCCCCCCCCC NNN XXXXXXXX STREET XXXXXXXXXX XXXXX NY10131 US00004 S20019034199098000016573100001000010000111/01/200100001CHACHANGE ENTITY DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD NNN BRICKELL AVE XXXXX NY10131 US00001 S20019034109098000016573100001000010000111/01/200100001CHACHANGE ENTITY CCCCCCCCCCCCCCCCCCCCC NNN BRICKELL AVE XXXXXXXXXXX XXXXX NY10131 US00002 S93000011488881000008166400001000010000105/28/190000001T TERMINATION 93000006751492000013063300001000010000103/30/190300001CHACHANGE EEEEEEEEEEEEEEEEEEEEEEEEEEEEEE NNNNNN PO BOX XXXXXXX NY10131 US00001 S93000006888499000013063300001000010000103/30/190300001CHACHANGE FFFFFFFFFFFFFFFFFFFFFFFF NNNNNN PO BOX XXXXXXX NY10131 US00002 S93000009777600988014854600001000010000105/10/190300001C CONTINUATION 94000011760092090006110800001000010000106/10/190400001T TERMINATION 89000018525888889013826800001000010000107/10/190500001CHANO VERBAGE FOR THIS EVENT GGGGGGGGGGGGGGGGGGGGGGG NNNN NTH ST XXXXX XXX NY US00001 D89000018525000187913826800001000010000107/10/190500001CHANO VERBAGE FOR THIS EVENT GGGGGGGGGGGGGGGGGGGGGGG NNNN X XXXXXXXXXX XXXX XXXXXXXXX XX XXXXXXXXXX NY33309 US00002 D89000019593800187090417700001000030000107/21/190700001CHANO VERBAGE FOR THIS EVENT HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH NNNN XXXXX XX XXXXX NY US00001 D89000019593800188889417700001000030000107/21/190700001CHANO VERBAGE FOR THIS EVENT IIIIIIIIIIIIIIIII NNNN XXXXX XX XXXXX NY10131 US00002 D89000019593955500000908900001000110000107/21/190700001CHANO VERBAGE FOR THIS EVENT JJJJJJJJJJJJJJJJJJJJJJJJJJ NNNN XXXXX XX XXXXX US00001 D89000019593900055500398900001000110000107/21/190700001CHANO VERBAGE FOR THIS EVENT IIIIIIIIIIIIIIIII NNNN XXXXX XX XXXXX NY10131 US00002 D89000020847100187015559400001000010000108/04/190700001COLAMENDMENT TO COLLATERAL 89000020846300183015998600001000030000108/04/190700001COLAMENDMENT TO COLLATERAL 89000020555500000088006500001000050000108/04/190700001COLAMENDMENT TO COLLATERAL 89000034023900186017451200001000020000112/28/190700001CHANO VERBAGE FOR THIS EVENT KKKKKKKKKKKKKKKKK NNN XXXXXXXXX XX XX XXXXXXXXXX NY US00002 S89000035555555186017451200001000020000112/28/190700001CHANO VERBAGE FOR THIS EVENT LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL N XXXXXXX XX XXXXXX XXXXXXX NY10131 US00001 S89000004235555500001560400001000010000202/14/190800001COLAMENDMENT TO COLLATERAL 93000007526100186008562800001000020000104/09/191100001COLAMENDMENT TO COLLATERAL 93000000707500187001964100001000020000101/11/191300001T TERMINATION 93000025555555000003847200001000010000110/08/191300001C CONTINUATION 94000055555551000008803700001000020000101/22/191400001CHACHANGE MMMMMMMMMMMMMMMMMM NNN X XXXX XX XX XXXXXXX NY10131 US00001 D94000001550791000008803700001000020000101/22/191400001CHACHANGE NNNNNNNNNNNNNNNNNNNNNNNN NNN X XXXX XX XXXXXXX NY10131 US00002 D94000011786589000021061500001000010000106/10/191400001C CONTINUATION 94000014005493000007012600001000020000107/14/191400001T TERMINATION 00387002743100182015057100001000010000104/22/191800001ADDNO VERBAGE FOR THIS EVENT OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO NNNN XXXXXX XXXXXXXXXXXXXXX XXX XXXX NY US00003 S
July 28th, 2008 8:30pm

For performance reasons, SSIS parses flat files column by column, instead parsing first the row delimiter and then the columns in the row. For that reason, it does not handle files files where the number of column vary on different rows. There are a number of different workarounds for this. I posted a sample here: http://agilebi.com/cs/blogs/jwelch/archive/2007/05/07/handling-flat-files-with-varying-numbers-of-columns.aspxthat uses a script component to parse the source. In your case, since you know that the flat file has a guaranteed number of columns, I'd bring this in as you are, but declared a single column to hold the remainder of the row, and parse it out. There is also a community sample of a component here (http://www.codeplex.com/SQLSrvIntegrationSrv) that uses RegEx to parse the flat file, which gives it a lot more flexibility than the built in flat file source. It's coded for 2008, but it shouldn't be too hard to createa 2005 version.
Free Windows Admin Tool Kit Click here and download it now
July 28th, 2008 9:01pm

Thanks! I will investigate those solutions. Anthony
July 28th, 2008 9:23pm

John - I've been reading the article that you pointed me to and the comments and was particularly concerned with the problems one person had due to the size of the input file. My input file will be 1.2 Gigabytes (no typo there... billions of bytes). It is tremendous and it comes from a State government IT department, so there is no chance to get them to output this any differently (or, as was my first suggestion to my client: ask them to fill out the un-used bytes in the record). That question generated more laughs than a Jim Carey movie. Now, I may just do something similar to that myself. I may pre-process this file, reading it line by line, and when I see that CRLF in position 128, simply re-write that record with spaces out to postion 323 and then import the file with the format given to me. I may also just sort the file by on position 128 and process the shorter records first, then process the longer records in the file afterward. Ultimately, the scientist in me wants to conquer why SSIS cant "see" the CRLF like other programs can. Its really sad that SSIS cant break a record at the CRLF, while other apps can... Example: Copy/paste those 31 records, thatIposted above,into Excel and set up the columns as given and whammo - Excel will "column out" the records as they should be. If there werent 3.5 million records in this file, I would probably also try to use Excel automation to get that file into a table - somehow. Before I begin looking at the code you have on your blog,and giving that a shot, is there a way to build a fully custom SSIS import of a text file, such as I have described above, using some sort of VB code or something along those lines? I am fully versed in SQL Server 2000 and DTS but I am brand spankin new to SQL Server 2005 *and* VB 2005 and this is what I need to work in... I could probably utilize FSO in VB6 and get this done in much less time but I do not have that option. So it must be SQL Server 2005 (SSIS) / VB 2005 (Visual Studio 2005) which is why I thought this was not working -- due to my being quite new to these versions of the development system. Thanks, Anthony
Free Windows Admin Tool Kit Click here and download it now
July 28th, 2008 10:00pm

Sure - you can build your own SSIS pipeline component, that will parse forrow delimiters first and column delimiters second. The second link I posted shows an example of that.However, writing a custom SSIS component is not a trivial task, and if you are not well-versed in .NET, it will be even more difficult. As far as file sizes - the code I have on my blog is a (very )slimmed down example of something I did in aproduction app. However, I was not dealing with 1GB files, and I would expect it to be pretty slow with those. The pre-processing idea sounds like it would work, though.
July 28th, 2008 10:08pm

Thanks John. All suggestions have been greatly appreciated!
Free Windows Admin Tool Kit Click here and download it now
July 28th, 2008 10:49pm

Hey Anthony,I ran accross a similar problem. Since this data is fixed, there is no need for a custom pipeline component. The process is detailed here: http://dataintegrity.wordpress.com/2009/10/02/vrl/JS
October 14th, 2009 3:40pm

hey ; you got the solution or not . if yes then pls send me the solution on hiral.jariwala7@gmail.com .... i have the same question thanks in advance
Free Windows Admin Tool Kit Click here and download it now
February 2nd, 2011 10:06pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics