Unusual SSIS Problem with file size, and fixed width columns.
Hi Guys, I hope that someone here can help me out, I have an unusual problem in SSIS. I need to pull in a flat file that I get every day, which is around 10 000 records, and once a month, the accumulative amount of 37Million records (ForEach Loop), each file that contains 10 000 records is about 50GB in size, and a single row is just over 100 000 characters. (Fixed Width). When I try to use SSIS's flat file connector to pull in this file, it has a maximum of 32 000 characters per line that I can pull in and split up. However, when I pull in the records using the ragged hierarchy, then it only pulls in plus minus 52 000 characters per row, which is a little more, but I'm still not getting my 100 000+ characters per line, but then I haven't defined any columns yet either. I will still need to split it up with T-SQL, which seems a little impractical. Does anyone know of a way that I can create either a SQL Script, SSIS Script, SSIS Package, BCP process or something like that to pull in so many records, and split them up at all or even better, with great ease? Kind Regards Carel Greaves
June 11th, 2012 6:10am

Hi Carel, Why need to pull these characters together, I guess they're in different columns, how about just extract some related fields as a group and output them to different files. Keep so many characters in same line, it's hard to process I think.Hope it helps! Vote if it's helpful! Simon MSN:simonlv@live.cn Skype:simonlvpin
Free Windows Admin Tool Kit Click here and download it now
June 11th, 2012 6:37am

Hi Simon, The way I receive the data is 1 line per person, and I need to pull it in, each person has 1007 columns of data, data of birth, name, surname, account number, etc. I need to pull in the columns like that. 1 Row per person. That's the first step, once I can actually get all the data in like that, then I plan to interview the business users, and see which columns they use most, and create a dimensional model for a data warehouse, and then go that route, but for now, I need to pull in all the data first. How do I split up the columns like that, I managed to try a quick SELECT * FROM OPENROWSET( BULK 'G:\Filename.txt',SINGLE_CLOB) AS x; But I still need to split up the rows, and columns.
June 11th, 2012 6:48am

Hi Carel, I met such requirements before, I processed a file which includes more then 300 columns in one line. It's also hard to read it due to the bad file format and sometimes it includes some special unseen characters or blanks. My package often produce error when using fixed length or rigged right to delimite them due to the incorrect file format. Finally, after a discussion with up stream source engineer, we decided to use ',' to delimite every columns with quota, because some columns includes comma inside. For example - "1001","Test Name","This is a description, just for test.." After this adjustment, we can process file correctly even some records include incorrect data type and something else, we output them to an error file for manual check. Also you can try CSV bulk insert, it will be faster than SSIS file mapping and loading I think, but the most important thing to file loading should be the file format. Hope it helps! Vote if it's helpful! Simon MSN:simonlv@live.cn Skype:simonlvpin
Free Windows Admin Tool Kit Click here and download it now
June 14th, 2012 10:58pm

Hi Carel, I would recommend you implement source script component and do the splitting and processing of the columns yourself.SSIS Tasks Components Scripts Services | http://www.cozyroc.com/
June 14th, 2012 11:06pm

There're 1007 columns need to be processed, that causes big efforts if process fixed-length columns in source script component, he also need to define 1007 columns in output. Hope it helps! Vote if it's helpful! Simon MSN:simonlv@live.cn Skype:simonlvpin
Free Windows Admin Tool Kit Click here and download it now
June 15th, 2012 2:06am

Thanks Guys, Although I just hit my head against the wall quickly due to realizing that the 1007 columns was the old file, and after picking myself up and sitting down again, I tried a BCP with a format file, and also SSIS Bulk Insert task with a file format. The new "fun" jack in the box is that I have the file format put together and have a nice 13349 columns, yes Columns not rows, so unfortunately I think that I might actually have to use a script task, or scratch up on my C# and write a quick app to do this for me. Hopefully excel will save me some time from having to type in all the columns. I'll use it to string all the code together for the columns, and split up the data into about 14 tables. I played with the idea of using SPARSE columns, but unfortunately that didn't work either, because then I would've been able to push up to 30 000 columns into a single table. Unfortunately I don't have a say in how the file gets delimited, otherwise I would go with your way Simon, it would make things a lot easier, but our data supplier just dumps the data for us in fixed width, and that's it. Does anyone have a better idea? I'm going to be hacking my way through C# in the meantime.
June 15th, 2012 3:02am

Hi Carel, It's definitely a challenge to process such evil data file, if we can't ask data supplier to do something for us, we have to process it through C# I think. I can share my idea with you although I never process such files in C# before. 1. I will extract only 5 - 10 rows into a testing file, process less records can save time and test logic easily. 2. Define an "Start-Position" array to record the start index of every columns. 3. Define an "length" array to record the length of every fixed-length columns. 4. Define an 'header' array to record the first row - headers. 5. Define an string arry to record every processed values. it will spend a long time to manully input values for column start index and length. Then you can substring values by start index, position and save values in values array, they have the same index from 0 - 13348. 6. There're two choices here: The first solution is use C# directly to get values from different columns and write SQL to insert them to 14 tables. You may need to convert the data type before inserting,and 14 SQLs are required in C# script. The second solution is - Once processed one-row records in array, output it to a file with a nice-formatted. That's mean I won't insert these records to tables in scripts, I just use C# script to split values and form these records to be a nice-formatted file. Then I will load this new produced and nice-formatted file in SSIS as a source to pass values to 14 destination tables. 7. If these 5-10 testing records can be correctly transferred, then I will enlarge it to 50-100 rows to make a test, 500 - 1000 later.. If all test are passed, you can have a good rest. Does is make sense and I have no idea if it's a better choice for you. What I know is if you decide to take my suggestion, I'm not sure if you have time to say 'hi' to us this two days, I can image how you put index and lenght from 0,1,2,3..13349 one by one.. Hope it helps, I'm also looking forward to find a better idea here. Hope it helps! Vote if it's helpful! Simon MSN:simonlv@live.cn Skype:simonlvpin
Free Windows Admin Tool Kit Click here and download it now
June 15th, 2012 7:02am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics